NVIDIA's next chip and the $1 trillion

Jensen Huang stood on a stage at NVIDIA GTC, held up a chip, and said the inference market would be worth $1 trillion.

I’ve learned to take Jensen’s numbers with a healthy serving of skepticism. He’s selling chips. The bigger the market forecast, the more chips he sells. That’s how incentives work.

But the demand data I’ve seen independently suggests he might not be exaggerating as much as usual.

The inference shift

For the last three years, the AI hardware conversation was about training. How many GPUs to train GPT-4. How many to train Claude. How many for Grok. Training was the bottleneck, and training required massive concentrated compute.

The conversation is shifting to inference. Every time you ask ChatGPT a question, that’s inference. Every Waymo ride uses inference. Every robot that processes visual input uses inference. Training happens once. Inference happens billions of times per day.

And inference demand is growing with every new AI application, every new user, every new device that integrates an AI model. The installed base of AI consumers is growing exponentially. Each one requires inference compute.

The numbers from NVIDIA

The next-gen architecture shows 2-3x improvement in inference throughput per watt over the current generation. In absolute terms, a single chip can handle inference workloads that would have required a rack of GPUs three years ago.

But the workloads are growing faster than the hardware improves. Longer context windows. More complex reasoning chains. Multi-modal processing (text plus images plus video plus audio). Each advancement demands more compute per query.

SemiAnalysis estimates that AI inference compute demand is doubling every 6-8 months. Hardware efficiency is doubling every 18-24 months. The gap is widening.

What a trillion dollars means

If the inference market is $1 trillion, that’s roughly 3x the current size of the entire semiconductor industry. It implies a world where AI inference is as fundamental as electricity or internet connectivity.

Is that realistic? I genuinely don’t know. But I know that every major technology company is building or expanding data centers. I know that every new AI feature increases inference demand. I know that the gap between compute supply and compute demand is growing.

Whether the number is $500 billion or $1 trillion or $2 trillion, the direction is clear. The world needs more AI compute than it can currently produce. The companies that supply that compute will define the next decade of technology.

NVIDIA is one of those companies. Probably the most important one. Jensen knows it. The leather jacket knows it. The market cap reflects it.

I’m watching the chips. The chips tell you more about the future than the models do, because the models depend on the chips and the chips have real physical constraints that software doesn’t.

The hunger for compute is real. The question is whether we can feed it.

Related thinking: