alt text

This principle extends beyond corporate life and into the world of AI. Personally, I address the first group producers and the second group promoters. And in the current AI ecosystem, we’re seeing far more promoters than producers — sometimes promoters disguised as producers. This phenomenon starts at the source: academia.

Academia: Paper Mills and Catchy Titles

Academia is becoming a paper mill, driven by the pressure to “publish or perish.” Many students and researchers are scrambling to catch the AI wave, not only for top-tier lab positions but also for immigration advantages (hello O-1 visas). This pressure has led to a flood of papers with catchy titles like “* is All You Need” or “[X] RAG”, hoping to attract enough attention from audiences, both academic and industry. Even just with RAG, there are at least 12 variations. alt text

But beneath the surface, there are rampant issues: citation rings, reproducibility crises, and even outright cheating. Just look at the Stanford students who claimed to fine-tune LLaMA3 to have be multimodal with vision at the level of GPT-4v, only to be exposed for faking their results. This incident is just the tip of the iceberg, with arXiv increasingly resembling BuzzFeed more than a serious academic repository.alt text

Industry Research

Things aren’t much better in the industry research labs. Here, valuable techniques often remain unpublished, kept secret to maintain a competitive edge. It’s reminiscent of the RSA algorithm, which made Ron Rivest, Adi Shamir and Leonard Adleman, the three inventors billions in the 1980s. But Clifford Cocks had already invented it in the early 1970s inside British intelligence, except its classified. The public did not know about this until its declassified in 1997, 24 years after the invention. Or take Ed Thorp, who invented the option pricing model and quietly traded on it for years until Black-Scholes published a similar formula and won a Nobel Prize. That leading edge research paper is most probably someone’s production code.

On the flip side, AI research that does get published by industry labs is often non-critical to production or intended as marketing material—to drive cloud usage or consulting deals. If they’re allowed to publish it, you can bet it’s not their most valuable work.

The AI Echo Chamber

This dynamic has created a wave of AI cheerleaders who pretend to read papers on arXiv but have little understanding of machine learning fundamentals. Armed with catchy titles and flashy charts, these influencers use language models to summarize complex papers they barely understand, spreading misinformation. And many of them are paid to do it—whether it’s corporate PR or helping someone’s visa application along. The noise gets louder, drowning out the real signals.

alt text

Impact

In a matter of weeks or months, these amplified noises filter down to LinkedIn, reaching non-technical audiences. This leads them to believe AI is far more capable than it is. Hallucination is solved! AI agents are coming for your job! Science fiction is marketed as reality, backed by clickbaits.

Meanwhile, data scientists and statisticians who oftentimes lack engineering skills are now being pushed to write Python and “do AI,” often producing nothing more than unscalable Jupyter Notebooks. They’re being sold a fantasy that AI/ML is as simple as a few lines of code, ignoring that these are rigorous engineering disciplines.

This is how we’re headed for another AI winter, just as we saw with the fall of data science, crypto, and the modern data stack. And that’s actually a good thing. The promoters will hop onto the next trendy buzzword, while the real producers will keep moving forward, building a more capable future for AI.

@article{
    leehanchung,
    author = {Lee, Hanchung},
    title = {AI Winter Is Coming},
    year = {2024},
    month = {09},
    howpublished = {\url{https://leehanchung.github.io}},
    url = {https://leehanchung.github.io/blogs/2024/09/20/ai-winter/}
}