GPT-4.5 and the diminishing returns of scale

OpenAI released GPT-4.5 and it’s good. It’s better than GPT-4o. The vibes are better. The reasoning is sharper. The creative writing is more natural.

But the jump from GPT-4 to GPT-4.5 is smaller than the jump from GPT-3.5 to GPT-4. Noticeably smaller. The benchmarks confirm what the experience suggests: the returns on scale are diminishing.

The scaling wall

For three years, the AI industry operated on a simple equation: more compute plus more data equals better models. This was true from GPT-2 to GPT-3 to GPT-4. Each generation was dramatically better because each generation was dramatically bigger.

GPT-4.5 is bigger than GPT-4. It used more compute, more data, and cost more to train. And it is better. Just not proportionally better. The cost went up a lot. The capability went up a little.

This is a familiar pattern in every technology. The first doublings produce huge gains. The later doublings produce incremental ones. It’s why a modern car isn’t 100x better than a 2005 car despite costing more to develop. The easy improvements get captured first.

What comes next

If “just make it bigger” is hitting a wall, the question becomes: what else is there?

Reasoning. Models that think before they answer, like o1, show that architecture changes can produce capability jumps that raw scale can’t. DeepSeek showed that clever engineering can match brute-force scale at a fraction of the cost.

Tool use. Models that can search the web, run code, read files, and interact with APIs are more capable than models that just generate text, regardless of parameter count.

Specialization. A smaller model trained specifically for medical diagnosis might outperform a general model 100x its size in that domain.

I think we’re at an inflection point. The era of “scale is all you need” produced incredible results. But the era that follows, where the gains come from architecture, reasoning, tool integration, and specialization, might produce results that are more practically useful even if the benchmarks don’t look as dramatic.

The emotional part

I’ll be honest: part of me is relieved. The “scale is all you need” era favored whoever had the most money. If brute force always wins, then only the richest companies matter. If cleverness matters too, the field stays open.

DeepSeek showed this from one direction. GPT-4.5’s diminishing returns show it from another. The future of AI probably isn’t about who builds the biggest model. It’s about who builds the smartest one.

That feels right to me. Not because I know it’s true. Because a world where cleverness beats capital is a world I prefer to live in.

Related thinking: