I spent a week with GPT-3 and I don't know what

I got GPT-3 API access last Tuesday.

For one week, I’ve been sitting at my desk, typing prompts into the OpenAI playground, and staring at what comes back. I’ve generated thousands of words across dozens of sessions. I’ve asked it to write poetry, explain physics, tell jokes, write code, summarize articles, debate philosophy, and compose fake news articles (to see if it could, not to use them).

Here’s what I know after seven days: I don’t know what to think.

The brilliant parts

I asked it to write a poem about a lonely satellite. Not a famous satellite. Not Voyager or Hubble. Just a generic satellite, forgotten, still orbiting, still transmitting to nobody.

What came back was beautiful. Genuinely beautiful. There was a line about “whispering data to a planet that forgot my name” that made me set down my coffee and reread it three times. The poem had meter. It had imagery. It had a melancholy that felt earned, not performed.

I asked it to explain how LIDAR works to a five-year-old. It said something about shooting tiny invisible beams of light really fast and counting how long they take to bounce back, “like playing the world’s fastest game of catch with a flashlight.” That’s a good explanation. Better than most textbooks.

I asked it to write the opening paragraph of a sci-fi novel about the last chip fabrication plant on Earth. What came back felt like it was written by someone who’d read a lot of Asimov and spent some time thinking about semiconductor manufacturing. It wasn’t great literature. But it was competent, evocative, and structured in a way that suggested understanding of narrative.

These moments are electric. You type a few sentences and something comes back that feels like it was written by a thoughtful person who just happens to live inside your browser tab.

The stupid parts

I asked it when Abraham Lincoln invented the telephone. It told me 1876. Confidently. With a paragraph of supporting detail about Lincoln’s interest in communication technology. Abraham Lincoln did not invent the telephone. Alexander Graham Bell did. And Lincoln was dead by 1876.

I asked it to multiply 47 by 83. It said 3,801. The answer is 3,901.

I asked it to tell me about the current president of France. It gave me an answer that was correct as of its training data but was delivered with absolute certainty, with no acknowledgment that information can change, that the present is different from the past, that “now” is a moving target.

I asked it to write a factual summary of the Mars Perseverance mission. It got most of the facts right and then casually mentioned that Perseverance would be accompanied by two other rovers. It would not. There’s one rover. The model just… invented companions for it.

The gap

The gap between GPT-3’s best outputs and its worst outputs is wider than any technology I’ve ever used. It’s not like a calculator that’s accurate or broken. It’s not like a search engine that finds the right page or doesn’t. It exists in a strange middle space where it’s simultaneously impressive and unreliable, sometimes within the same paragraph.

This is the thing I keep coming back to. When GPT-3 is good, it’s so good that you start attributing understanding to it. You start thinking, “it gets this.” And then it says something so confidently wrong that you realize no, it doesn’t get anything. It’s producing sequences of likely words. The sequences are often right because the training data is vast and the patterns are real. But there’s no one home. There’s no understanding behind the curtain.

Or is there?

Gwern’s experiments with GPT-3 creative fiction suggest something more subtle. The model seems to have something like a style. It makes choices that feel deliberate. It has preferences. Not real preferences, I know. But functional ones. Things that look and act like preferences from the outside.

What this means

I think we’re going to spend a lot of time in this gap. Not for weeks or months, but for years. The space between “brilliant” and “stupid” is where all the interesting questions live.

Can you trust it? No. Not for facts. Can you use it? Yes. As a writing partner, a brainstorming tool, a first draft machine. Can it replace a person? Not any person I know. But it can approximate a person well enough to be useful in specific contexts.

And the approximation is getting better.

I’ve spent a week with this thing and the feeling I have is the same one I had watching the Falcon Heavy boosters land. It’s the feeling of seeing something that shouldn’t be possible doing something that shouldn’t work. Except the rocket landing was clean and binary. It either lands or it doesn’t. GPT-3 half-lands, beautifully, and then tips over and catches fire, and then rights itself and sticks the next landing perfectly.

I’m going to keep playing with it. I’m going to keep being amazed and disappointed in the same session. And I’m going to keep trying to figure out what to think.

I’ll let you know when I figure it out.

Related thinking: