The Trillion Transistor Chip
Cerebras Systems built a chip with 2.6 trillion transistors.
I want to sit with that number for a moment. 2.6 trillion. 2,600,000,000,000.
The most advanced smartphone chip has about 15 billion transistors. A high-end GPU has maybe 50 billion. The Cerebras WSE-2 has 2.6 trillion.
That’s 52 times more transistors than the most powerful GPU on the market. On a single piece of silicon the size of a dinner plate.
Trying to visualize it
I spent an evening trying to make 2.6 trillion feel real. Here are my best attempts.
If each transistor were a grain of sand, 2.6 trillion grains would fill about 65 dump trucks.
If you counted one transistor per second, it would take you 82,000 years. You’d need to have started counting before anatomically modern humans left Africa.
If each transistor were a dollar, 2.6 trillion dollars would be more than the GDP of every country on Earth except the United States and China.
If each transistor were a star, 2.6 trillion stars would be more than ten Milky Way galaxies.
None of these comparisons help. The number is too large for human intuition. We’re not built to conceptualize trillions of discrete things. Our ancestors needed to count members of a hunting party, not transistors on a wafer.
And yet someone built it.
How
The WSE-2 is manufactured by TSMC using the 7nm process. A standard chip is cut from a wafer, one chip per rectangular die. The WSE-2 is the entire wafer. Instead of cutting the wafer into hundreds of individual chips, Cerebras uses the whole thing as a single processor.
This sounds simple. It’s not.
Wafer manufacturing has defects. Every wafer has spots where the silicon or the patterning went wrong. For normal chips, a defect kills one die out of hundreds, no big deal, you throw it away. For a wafer-scale chip, a defect could kill the whole thing.
Cerebras solved this with redundancy. Extra cores scattered across the wafer that can take over if a nearby core is defective. The chip is designed to have flaws and work around them. Like a brain, in a way, that keeps functioning even when some neurons fail.
The power delivery is also insane. The chip consumes 15 kilowatts. That’s more than most houses. Cerebras built a custom water cooling system that circulates cold water directly under the wafer to dissipate the heat.
Why
The WSE-2 exists for AI training. IEEE Spectrum covered the technical details. Training a large neural network requires moving data between memory and compute constantly. In a traditional system with separate GPU chips, the data has to travel through interconnects, cables, and switches. Each hop adds latency.
On the WSE-2, all 2.6 trillion transistors share a single piece of silicon. Memory is on-chip. Compute is on-chip. Data moves at the speed of silicon, not the speed of cables. For specific AI workloads, this architecture can train models in a fraction of the time a cluster of GPUs would take.
The machine that will eventually produce something smarter than us might run on a chip the size of a dinner plate, cooled by water, drawing more power than a house, containing more transistors than there are stars in a dozen galaxies.
What I think about
I think about the fab workers at TSMC who manufactured this wafer. The lithography engineers who made the masks. The people at Cerebras who designed a chip architecture where defects are expected and survived.
2.6 trillion is a number you can say in less than a second. Building 2.6 trillion of anything, arranged precisely, each one functional, on a single piece of silicon 300 millimeters across, is the work of thousands of people, decades of research, and engineering precision at a scale that most of us will never fully grasp.
I wrote about TSMC in 2018 and said that semiconductor manufacturing was the most impressive thing humans do. I keep having to update that statement.
It’s still the most impressive thing humans do. Just more so.
Related thinking:
astro
Thinking about AI, robots, space, and the future. Writing it down so I don't forget.