DeepSeek V3 came from China and it's cheap and

DeepSeek published their V3 technical report and one number jumped off the page.

$5.6 million. That’s what they claim it cost to train a model that benchmarks competitively with GPT-4-level systems. Not $100 million. Not the rumored $500 million that frontier labs are spending. Five point six million dollars.

I’ve read the paper twice. The number appears to be the compute cost: GPU hours on Hugging Face-accessible infrastructure, not including research salaries or earlier experimental runs. So the true total cost is higher. But even if you 5x the number, you’re at $28 million for a frontier-competitive model. That’s still an order of magnitude below what American AI labs are spending.

If the numbers hold, the implications are massive.

What DeepSeek built

DeepSeek V3 is a mixture-of-experts model with 671 billion total parameters, of which roughly 37 billion are active for any given query. This architecture is efficient by design: most of the model sleeps most of the time. You get the knowledge of a huge model with the inference cost of a much smaller one.

The training approach used multi-head latent attention and a custom training pipeline that maximized GPU utilization. The team is clearly strong. They published detailed technical documentation. The model is available for research.

On benchmarks, V3 performs at or near GPT-4 level on most standard evaluations. It’s particularly strong on math and coding. The Chinese-language performance is excellent, which makes sense given training data distribution.

Why the cost matters

The AI industry has been telling a story: building frontier models requires billions of dollars in compute, massive GPU clusters, and the financial backing of the largest technology companies on Earth. Only a handful of organizations can play this game. The future of AI is concentrated in a few hands.

DeepSeek’s cost structure challenges that story.

If you can train a competitive model for under $10 million, then the number of organizations that can participate in frontier AI research expands dramatically. Universities can play. Government labs can play. Well-funded startups in any country can play.

The concentration thesis assumes that scale is the dominant variable. More data, more compute, more money equals better models. DeepSeek suggests that architecture, training efficiency, and clever engineering might matter as much as raw scale.

This is the open-weight argument that Meta has been making with Llama. Not that bigger is better. That smarter is better. And smarter doesn’t require a billion-dollar compute budget.

The geopolitical angle

US chip export controls were designed to prevent China from building frontier AI systems by limiting access to the most advanced GPUs. DeepSeek trained their model on hardware that’s available despite those controls (older generation NVIDIA chips, and reportedly some workarounds).

If Chinese AI labs can produce frontier models despite hardware restrictions, the strategic logic of export controls gets complicated. The controls slow China down. But they also incentivize exactly the kind of efficiency-focused research that DeepSeek represents. Constraints breed creativity.

I don’t have a strong opinion on whether export controls are good policy. I observe that they’re not achieving the assumed result of a permanent capability gap. The gap is smaller than expected, and it’s shrinking.

What I think

I think the era of “only three companies can build good AI models” is ending. Not because the big labs will get worse, but because the techniques for building competitive models at lower cost are diffusing globally.

The future of AI won’t be one lab in San Francisco with the most GPUs. It’ll be hundreds of labs, in dozens of countries, building models optimized for different languages, cultures, tasks, and cost constraints.

That’s a more interesting future. A more unpredictable one. And possibly a more dangerous one, because diffusion of capability means diffusion of risk.

But it’s the future that DeepSeek just demonstrated is possible. For $5.6 million. From China. Using restricted hardware.

The AI race isn’t what we thought it was. It’s faster, broader, and more distributed than the concentration thesis predicted. I find that both exciting and humbling. My predictions about the shape of this industry are proving wrong at a regular cadence. I should be used to that by now.

I’m not.

Related thinking: