o1 thinks before it answers and that changes

OpenAI released o1 and something fundamental shifted.

Previous language models generate responses by predicting the next token. Word by word, left to right, as fast as the hardware allows. The thinking happens in the generation. You can see the AI “think” because thinking and speaking are the same process.

o1 is different. It reasons first. Before any text appears on your screen, the model is working through the problem internally. Chain of thought. Planning. Checking its own logic. This happens in a hidden scratch space you can’t see. Then, after seconds (sometimes many seconds), the answer appears.

An AI that thinks privately before speaking.

I don’t think people have fully processed how different that is.

What it can do

I gave o1 a math competition problem that I couldn’t solve. Not a simple word problem. A problem from a national math olympiad. I’d spent 20 minutes on it and gotten stuck.

o1 solved it in 14 seconds. The solution was correct. The reasoning (what OpenAI lets you see of it, a summary, not the full chain) was sound.

I gave it a logic puzzle that requires working through multiple interdependent constraints. The kind of thing where you need to track six variables simultaneously and test each combination. I can do these, but slowly, with a pencil and paper and a lot of erasing.

o1 did it in 8 seconds. Correct.

I gave it a physics problem involving orbital mechanics. It asked me no clarifying questions. It set up the differential equations, solved them symbolically, and gave a numerical answer. I checked the answer against a known solution. Correct.

What’s different about reasoning models

The distinction between a regular language model and a reasoning model might sound subtle. Both produce text. Both can answer questions. But the quality of output on hard problems is dramatically different.

A regular language model (GPT-4, Claude 3.5) will attempt a math problem by generating the most statistically likely sequence of tokens that looks like a correct solution. Often this works. Sometimes it doesn’t, and the failure mode is a confident, well-formatted wrong answer.

o1 actually works through the problem. It tests approaches. It backtracks when something doesn’t work. It verifies its answer before presenting it. The process is closer to how a human expert solves a hard problem: try, check, revise, try again.

The result is a model that’s dramatically more reliable on problems that require reasoning. Math, logic, coding algorithms, scientific analysis. The domains where “thinking hard” matters more than “knowing a lot.”

The hidden reasoning

This is the part that makes me uneasy.

OpenAI chose not to show you the full reasoning chain. You get a summary. “Thinking… Considered three approaches. Verified the solution.” But the actual internal monologue, the step-by-step reasoning, is hidden.

I understand why. The reasoning process is messy, verbose, and would be confusing to most users. It might also reveal things about the model’s architecture that OpenAI wants to keep proprietary.

But an AI that thinks thoughts you can’t see is a new kind of thing. Previous models were transparent in a specific way: the output was the process. With o1, there’s a process and an output, and they’re separate. The model has an inner life (functionally, not philosophically) that you don’t have access to.

I’m not saying this is dangerous. I’m saying it’s a category change. We went from “AI that speaks as it thinks” to “AI that thinks, then speaks.” That’s a meaningful line to cross.

The implications for AI development

Reasoning models suggest a new scaling approach. Instead of making models bigger (more parameters, more training data), you make them think longer. More compute at inference time, not training time. The model spends more time on each answer, and the answers get better.

If this approach holds, it changes the economics of AI. Training a huge model is a one-time cost of hundreds of millions of dollars. Running inference at increased compute is a per-query cost that scales with usage. The cost structure shifts from “build it once, run it cheap” to “build it once, think as hard as the problem requires.”

Hard problems cost more to answer. Easy problems cost less. AI pricing starts to look like hiring an expert: simple questions are cheap, complex analysis is expensive.

I find this strangely appropriate.

My discomfort

I’m a person who values understanding the process. When I solve a problem, I want to see the work. I want to know why the answer is what it is, not just what it is.

o1 shows me the answer. Sometimes a summary of the process. But not the work.

I find myself trusting it because it’s been right. But trust based on track record is different from trust based on understanding. I trust my mechanic because he’s been right before. But I’d rather understand the engine myself.

With o1, understanding the engine isn’t an option. The reasoning is hidden. The thinking is private.

An AI that thinks before it speaks. I’m still thinking about what that means. And unlike o1, you can see all of my thinking right here on the page.

Related thinking: