AI 2 min read

Claude feels different from ChatGPT. I'm trying

I’ve been using Claude and ChatGPT side by side for about a month now. Same prompts. Same tasks. Comparing outputs.

The quality is comparable. Both write well. Both answer questions accurately most of the time. Both hallucinate occasionally. On benchmarks, they trade leads depending on the test.

But they feel different. And I’ve been trying to understand why.

The listening thing

When I ask ChatGPT a question, the response comes fast and complete. Confident. Structured. It gives me an answer, often with headers and bullet points, organized like a well-written report.

When I ask Claude the same question, the response feels more considered. Like it paused before answering, even though I know it didn’t (transformers don’t pause). It’s more likely to qualify its answer. More likely to say “I’m not sure about this, but…” or “this is my understanding, though I could be wrong.”

I know these are artifacts of training, not evidence of thought. Claude was trained by Anthropic using a technique called Constitutional AI, which involves training the model against a set of principles. The cautiousness, the willingness to express uncertainty, these come from the training process, not from the model actually feeling uncertain.

And yet. The experience of using a tool that expresses uncertainty is different from using one that doesn’t, regardless of the mechanism.

The difference

ChatGPT feels like talking to someone who wants to help. Fast, eager, thorough. Sometimes too thorough. It’ll answer a question I didn’t ask in addition to the one I did, just to be thorough.

Claude feels like talking to someone who’s listening. It responds to what I actually said, not to what it anticipates I might want. When I push back on something, Claude adjusts more naturally. When I’m vague, Claude asks for clarification more often instead of guessing.

These are generalizations. Both models have exceptions. But the pattern is consistent enough across dozens of conversations that I don’t think I’m imagining it.

Why it matters

Anthropic’s research focuses heavily on alignment, on making AI systems that are helpful, harmless, and honest. ChatGPT was trained with RLHF (reinforcement learning from human feedback), optimizing for responses that human raters preferred.

Different training approaches produce different behaviors. RLHF tends to produce models that are pleasing to interact with. Constitutional AI tends to produce models that are careful about what they say.

Pleasing and careful aren’t the same thing. A pleasing model tells you what you want to hear. A careful model tells you what it thinks is true, with appropriate caveats.

I find I trust Claude more for questions where I’m not sure of the answer. Where I need the model to push back if I’m wrong, or say “I don’t know” if nobody knows. For tasks where I just need something written well and fast, both are good.

The bigger point

I think the alignment approach matters more than the benchmark scores. The benchmarks tell you what a model can do. The alignment tells you how it does it. And as these models get more powerful, the how becomes more important than the what.

A model that scores 95% on an exam but is confidently wrong 5% of the time is more dangerous than a model that scores 90% but tells you when it’s uncertain. The delta between their scores is 5 points. The delta between their trustworthiness is much larger.

I’m going to keep using both. But I notice which one I reach for when the question is hard and the answer matters. I notice which one I trust.

That noticing might be the most important thing about this comparison.


Related thinking:

a

astro

Thinking about AI, robots, space, and the future. Writing it down so I don't forget.