DeepSeek R1 and the moment China showed it

I woke up to a group chat on fire.

Every AI researcher I know was sending the same link. DeepSeek had released R1, a reasoning model, and the benchmarks were sitting right next to OpenAI’s o1. Not below. Next to.

That alone would be notable. But the part that made me sit up in bed and start reading the paper at 6am was the hardware. DeepSeek trained R1 on older NVIDIA GPUs. Not H100s. Not the latest silicon. The kind of chips that the US export controls were specifically designed to allow China to still access, because they were considered “not advanced enough” to train frontier models.

Constraints, it turns out, are creative directors.

The numbers that broke the narrative

The US chip export strategy was built on a simple thesis: if you limit access to the best GPUs, you limit the ability to train the best AI models. More compute equals better models. Therefore, less compute equals worse models.

DeepSeek’s response was to get creative with less. Mixture-of-experts architecture. Better data curation. More efficient training recipes. Clever engineering at every layer of the stack. They didn’t brute-force their way to a frontier model. They finessed their way there.

The cost estimates are what really got people talking. Where a comparable model from a US lab might cost $100-500 million to train, DeepSeek reportedly spent under $10 million. If those numbers hold (and I have some healthy skepticism about self-reported training costs), they don’t just challenge the assumption that you need the best chips. They challenge the assumption that you need the most money.

Why this matters beyond the benchmarks

I’ve been writing about the AI race for years, and for most of that time, the frame was simple: whoever has the most GPUs wins. NVIDIA chips as the new oil. Stockpiles and export controls and geopolitical chess.

That frame isn’t wrong, exactly. More compute does help. NVIDIA’s H100 and B200 are genuinely superior hardware. But DeepSeek R1 is evidence that the relationship between compute and capability isn’t as linear as we assumed.

Think about it this way. The US spent years building a wall to limit China’s access to advanced chips. DeepSeek didn’t try to climb the wall. They made the wall less relevant by showing you don’t need what’s on the other side, at least not at the current frontier.

That’s a problem if your entire strategy depends on the wall working.

The open question

Here’s what I keep coming back to. If constraints bred this much creativity at the current scale of AI, what happens when the models get bigger? When the compute requirements grow by 10x or 100x? Does the efficiency gap that DeepSeek demonstrated scale up? Or does raw compute eventually win because the problems get so large that cleverness alone can’t compensate?

I don’t know. Nobody does.

But I do know this: the assumption that AI leadership belongs to whoever writes the biggest check is weaker today than it was yesterday. And the assumption that export controls can contain AI capability just took a serious hit.

The Hugging Face leaderboard now has a Chinese open-weight model sitting next to the best American closed-source model. A year ago, that sentence would’ve sounded unrealistic. Today it’s just a fact.

What I’m watching

I’m watching the response from US labs. Not their public statements. Their hiring patterns. Their architecture choices. If DeepSeek’s approach works at this scale, the smartest researchers in Silicon Valley are already studying it, already adapting it, already incorporating the ideas that made it possible.

The best response to a competitor’s breakthrough isn’t to restrict them. It’s to learn from them.

I’m also watching the next generation of export controls. If older GPUs are sufficient for frontier models, the definition of “controlled hardware” needs to change. And changing it means restricting more chips, which means more geopolitical friction, which means more incentive for China to develop its own chip industry from scratch.

The loop tightens.

I don’t think there’s a clean answer here. I think the world just got more complicated, and the people in charge of making it simpler are running out of levers.

DeepSeek R1 didn’t just release a model. It released a question: what if the compute advantage doesn’t matter as much as we thought?

I’m sitting with that question. It’s a heavy one.

Related thinking: