One Chaotic Model-Year Later

It has been roughly one full “model-year” since ChatGPT arrived. The unit is arbitrary, of course, but so are most anniversaries. Humans like turning time into little rituals. You could just as reasonably celebrate your 10,000th day alive instead of your birthday: for most people that happens only around ages 27, 54, and 82, which map surprisingly well onto three modern life stages—getting established, the midlife crack-up, and heading for the grave. We have ten fingers, yet apparently even that isn’t enough to make us count in thousands.

If I had to compress this past year of large language models into five four-character phrases, they would be: one dominant player, many voices competing, the washout phase, one model charging ahead, and finally division of labor.

From one-company dominance to a crowded field

Before ChatGPT, major companies had already been betting hard on AI. Under the deep learning wave, computer vision was the first area to truly break into the mainstream. In China, the so-called AI "four little dragons" were all focused on vision, and facial recognition had already become a standard example in national-tech storytelling.

Then ChatGPT appeared, and suddenly the adjective attached to artificial intelligence became generative. Public attention swung sharply toward natural language processing. Strictly speaking, though, text-to-image systems such as Stable Diffusion in the second half of 2022 fit the classic idea of generative modeling more literally: they generate images by repeatedly denoising noise. Transformer-based language models are closer to an extremely elaborate form of context-conditioned completion, where prompts and prior text steer what comes next.

That distinction mattered less than the market reality. From late 2022 onward, OpenAI was the one clear giant. Around then, big tech fully woke up to the strategic importance of large language models and began hoarding GPUs to train their own systems. Through the second half of 2023 and into the first half of 2024, the first batch of those in-house models started shipping.

But by that point OpenAI had already moved on to GPT-4. Many new entrants were decent, some even impressive, yet when compared with GPT-4, the field was still narrow. Claude stood out somewhat on programming tasks, but once the initial excitement settled, the real choices were fewer than the headlines suggested.

China, however, developed a somewhat different market dynamic. Since many overseas services were hard or impossible to access, domestic models found room to become genuinely popular. Doubao and Kimi gained especially high visibility, and Kimi in particular seemed to be spending heavily on advertising. Even at that stage, the way these models were marketed already hinted at where things were going next: some emphasized long-context handling, others multimodality. Still, the main users remained fairly concentrated—mostly programmers and students. The so-called breakout into the mainstream was often more visible in news coverage than in everyday life.

DeepSeek changed the mood

At the end of 2024, DeepSeek landed like a cavalry charge.

Many people explain its rise by saying its performance was close to leading proprietary models. That helped, but it was not the decisive factor. The real turning point was open source.

Most major-company models had gone down the closed route. Meta’s Llama and Alibaba’s Qwen were already well known in the open-model community, but because they still lagged cloud-based flagship models, they often felt more like interesting toys than serious replacements.

DeepSeek had already built credibility among programmers before V3. Its earlier V2 was effectively a local-side enhancement for many developers. When V3 arrived, the initial buzz was not overwhelming, and its low training cost was not some secret revelation either—the community already knew. The real shift came with R1.

Some background helps here. In 2024, OpenAI introduced the o1 reasoning model. Since it was initially available only to paying users, public discussion around it stayed relatively limited, though open-source imitations appeared quickly. The simplest version of that idea was basically prompt engineering: guide a model to think in multiple rounds before answering. In broad terms, a reasoning model first talks to itself, performs a kind of dialectical internal analysis, and only then produces an answer. Functionally, that resembles multi-turn dialogue compressed inside the model. That is also why token usage shoots up—the “thinking” stage replaces the user repeatedly asking follow-up questions.

At the moment R1 arrived, reasoning models still felt like premium goods. Then DeepSeek not only open-sourced its own R1, but also distilled it into multiple smaller models. Overnight, open-model users went from coarse grain to fine dining.

Open source also brought something more important than convenience: technological equalization. From late 2024 into the opening months of 2025, demand for locally deploying R1 or distilled R1 variants spread rapidly across the internet. Local deployment eased one of the biggest obstacles to broad adoption: privacy concerns. It also gave users a sense of control over the model itself. Very quickly, experiments spread from individual hobbyists to institutions and government bodies.

Before DeepSeek, one major reason large language models had not gone fully wide was the combination of privacy worries and performance gaps. Open models had always offered the privacy advantage of local deployment, but before R1 they still trailed cloud leaders too clearly. With R1, that gap shrank enough that now there are probably countless corners of the world running local models for strange, highly specific, perfectly practical tasks.

The limits of the new hero

That said, my own experience with R1 is that it hallucinates quite a bit and can feel a little neurotic. V3 is steadier.

And once the local deployment boom hit at the start of the year, large companies plainly felt the pressure. Previously, these local users still depended heavily on their APIs and burned their tokens. Now it was as if the market had switched to a new energy source.

From early 2025 onward, major players visibly accelerated. But this time the goal was no longer the old dream of a single universal model that beats ten rivals at once. The age of classical heroism in large language models is fading. What is arriving instead is division of labor.

The signs had been there for a while. OpenAI built GPT-4, yet GPT-5 has remained conspicuously delayed. My guess is not that the company got lazy, but that some later training runs either hallucinated too much, regressed in capability, or ran into deeper architectural or data limits. Either the model structure needs a more radical shift, or the pool of useful training data is no longer as abundant as before.

While progress in base models has begun to look more incremental, the application market has already delivered its verdict: cooperation and specialization work. The "hundred schools contending" phase had already foreshadowed this in each company’s marketing language. Even MoE architectures hinted at the same thing from another direction. Models were beginning to seek niches within their own ecosystems.

AI has entered its niche era

“Niche” is a strange and useful concept. I first encountered it in college ecology, and at the time I never imagined it would become such a fashionable business word. Back then I understood it simply: in a food web, each species survives by getting good at one part of the hunt, reducing direct competition through specialization.

That idea now fits large language models surprisingly well.

Researchers may still want to forge an all-purpose golden elixir, but the market has already discovered a harsher rule: if your model cannot do at least one thing exceptionally well, it may not earn the fuel needed for the next training run.

Right now, the division of labor among large language models can be summarized as six arts:

Coding
Reasoning
Multimodality
Memory (long context)
On-device deployment
Real-time response

A new all-around champion may be harder to come by, but excellence in any one of these is enough to build a real market.

Coding barely needs explanation. The number of paying users for tools like Cursor or Copilot says plenty.

Reasoning is tightly tied to deep research workflows and multi-round feedback. For people who work primarily with text, this is not a toy feature but heavy artillery.

Multimodality matters because not all useful information is text, and because non-text modalities are also the incubation ground for the next generation of models.

Memory, especially long context, is foundational for agent-based applications. Retrieval-augmented generation can also be understood as a kind of memory architecture.

On-device models matter for local interaction on machines and in homes. If smart home interfaces are ever to move beyond buttons and become truly microphone-first, lightweight local models will be central.

Real-time capability matters in scenarios where latency is the product: translation, meeting summarization, live assistance, and similar uses.

These six capabilities can of course be combined in more complex systems. But even taken separately, each of them already looks commercially viable.

This kind of specialization does not map neatly onto humans

The reason is simple: in each of these specialized areas, large language models have already surpassed ordinary human performance, or at least produce output that passes a practical Turing test well enough that it becomes hard to tell whether the author was a trained expert or a model.

You cannot easily hire one engineer who is comfortable across every programming language. You cannot hire an office worker who can generate a decent report in minutes on demand. You cannot hire an artist who effortlessly switches among many visual styles. You cannot hire a personal assistant who knows you in extreme detail. You cannot hire a swarm of little servants to orbit you all day. You certainly cannot hire a real-time interpreter who effortlessly understands every language you might throw at them.

These are visible replacement zones, but also visible demand-creation zones.

Human roles do not disappear just because models get better. Plastic surpassed paper and glass on many practical dimensions, yet people still associate it with cheapness. That same sense of cheapness still clings to content generated by large language models. However good a prepared meal becomes, there will always be people who insist that a chef’s flame and private recipe are in another league.

But technological equalization has never cared much about those aesthetic snobberies. First you let most people access what was previously fenced off by professional barriers or price barriers. Only afterward do people debate purity.

It is also worth admitting that some occupational prestige in modern society was inflated to begin with. Take programmers and civil servants: for the past decade or two, the premium attached to those roles has been unusually obvious. In a world where nearly everyone can code a little and draft a competent report, some of that premium should naturally fade.

That does not stop people from offering advice as if the world were static and old formulas still held. Better to look at the underlying problems and use cases a profession exists to solve. If the problem remains and the scenario remains, the profession remains. If the problem can now be solved cheaply, or the scenario that created the problem disappears, the profession will fade on its own.

What people need to learn now

Even in a specialized form, the underlying knowledge base of these models is at least at roughly a GPT-4 level, which already puts them beyond the average undergraduate and probably beyond many graduate students in broad knowledge recall.

That creates an awkward situation for universities. If higher education continues to emphasize knowledge delivery in the old way, it risks graduating students whose stock of explicit knowledge is weaker than the models they use, while also failing to train them in identifying and handling real-world problems.

Paradoxically, as large language models move toward specialization and collaboration, human training may need to move in the opposite direction—toward more integrated individuals. Not all-powerful individuals, but people who can effectively draw from the whole human knowledge commons, with large language models included as one part of that commons.

The more tightly work is tied to concrete situations and specific problems, the more room remains for human initiative. At least for now. Whether that remains true for the next 10,000 days is another question entirely.

History is getting shorter

Under current technological conditions, history itself feels compressed. What counted as advanced technology a few weeks ago may already be obsolete today. Yet society has not built schools capable of teaching history on that timescale. Sometimes we do not even finish reviewing one phase before the next phase has ended.

There is no need for panic, though. Human problems still need human beings involved in solving them. It may be wiser to give up some unnecessary desire for control. Artificial intelligence is, in the main, a good thing. People say they fear it will take jobs, but more often what they really fear is that it will take wages. If technological equalization driven by AI lowers the cost of living, it may also give people more free time.

Whether modern people, shaped by modern systems of discipline, would know how to enjoy that extra life is another question—one even a large language model might hesitate to answer.