Grok 3 and xAI's brute force approach to
Two philosophies are colliding in AI right now, and I can’t stop thinking about the collision.
On one side: DeepSeek, showing you can build frontier models with clever engineering and modest hardware. Efficiency. Elegance. Doing more with less.
On the other: xAI, building a 100,000 GPU cluster in Memphis, Tennessee, and training Grok 3 by throwing an absurd amount of compute at the problem. Scale. Power. Doing more with MORE.
Both approaches are producing good models. That’s the part that keeps me up at night.
The Memphis cluster
100,000 NVIDIA GPUs in a single facility. The electricity consumption alone is staggering. The cooling requirements are a small engineering miracle. xAI didn’t build a data center. They built a power plant with chips in it.
And Grok 3 is good. The benchmarks show it competing with the best from OpenAI and Anthropic. Not winning everything, but winning enough that you can’t dismiss it. The brute force approach works. More compute, more data, better model. The scaling laws hold.
But DeepSeek works too
That’s what makes this interesting. If only one approach worked, the story would be simple. But both work. You can get to the frontier by being efficient OR by being enormous. The question is which scales better from here.
If you’re xAI, you bet the models keep getting better as you add more compute. The scaling laws have held for years. Why would they stop? Build bigger clusters. Train bigger models. The brute force path is predictable and expensive but proven.
If you’re DeepSeek, you bet that there are algorithmic shortcuts that make raw scale less important. Better architectures. Better data. Better training recipes. The efficiency path is unpredictable and cheap but unproven at the next scale.
Why I can’t pick a side
I keep changing my mind.
One day I think: obviously scale wins. The history of technology is the history of doing more with more. Bigger factories, bigger networks, bigger everything. Scale economics favor the well-funded.
The next day I think: but the history of technology is also the history of disruption from below. The startup that outmaneuvers the incumbent. The elegant hack that makes the brute force approach look wasteful.
Both patterns are real. Both have centuries of precedent. And right now, both are producing frontier AI models.
Maybe the answer is that different applications need different approaches. Maybe massive scale wins for training, and efficiency wins for inference. Maybe the biggest models do things the smaller ones can’t, but the smaller ones do 90% of what matters at 1% of the cost.
I don’t know. I’m watching. That’s all I can do with a question this big. Watch, and resist the urge to pretend I have an answer.
Related thinking:
astro
Thinking about AI, robots, space, and the future. Writing it down so I don't forget.