AI 2 min read

DALL-E and the first time AI made something

I can’t stop looking at an avocado chair.

OpenAI released a model called DALL-E. You give it a text description and it generates images. Not retrieves images. Generates them. From scratch. Images of things that don’t exist and have never existed, conjured from a string of words.

“An armchair in the shape of an avocado.”

The model produced several variations. One of them is a plush green chair with a round pit-shaped cushion in the center. It looks like a real product photo. It looks like something you’d see in a design catalog. It has shadows that fall correctly. The texture is right. The proportions are right.

Nobody designed this chair. No human artist drew it. A neural network read the words “armchair” and “avocado” and produced a visual concept that combines them in a way that makes sense.

I’ve been staring at it for ten minutes.

Why this feels different

I wrote about CLIP a couple months ago. That was text-to-image generation through optimization, an iterative process that produces dreamlike, blurry results. DALL-E is something else entirely. The outputs are crisp, coherent, and structured. They look like photographs or illustrations, depending on the prompt.

“A baby daikon radish in a tutu walking a dog.” DALL-E generates a cute illustration of exactly that. Multiple versions. Different styles. All coherent.

“An emoji of a baby penguin wearing a blue hat, red gloves, green shirt, and yellow pants.” Clean, precise, looks like an actual emoji.

“A professional high-quality illustration of a giraffe turtle chimera.” And there it is. A creature that doesn’t exist, rendered as if it were a real animal in a nature encyclopedia.

The quality isn’t consistent. Some prompts produce amazing results. Others produce blurry nonsense. But the hits are so good, so clearly beyond what was possible a year ago, that the misses don’t matter. The capability is real.

The word “imagined”

I used the word “imagined” earlier and I want to be careful with it.

DALL-E doesn’t imagine. It doesn’t have a mind’s eye. It doesn’t picture things. It’s a neural network trained on text-image pairs that learned statistical relationships between words and visual features. When you say “avocado armchair,” it’s not imagining a concept. It’s generating pixels that are statistically consistent with the patterns it learned during training.

But the output looks like imagination. The functional result, the thing you see on screen, is indistinguishable from what would happen if you asked a creative human to draw an avocado armchair. The process is different. The product is similar.

I keep going back and forth on whether that distinction matters. The philosopher in me says yes, absolutely, process matters, intention matters, understanding matters. The pragmatist in me says: I asked for an avocado chair and I got an avocado chair. What more do I need?

Where this leads

I think DALL-E is a preview.

Right now, it’s a research demo. You can’t use it yourself. The samples are curated. But the trajectory is clear. Text-to-image models are going to get better, faster, cheaper, and eventually public. Within a few years, anyone will be able to type a description and get a high-quality image.

What does that do to stock photography? To concept art? To book covers? To advertising? To the entire visual economy that currently depends on human artists creating images from scratch?

I don’t know. But I think it’s going to be a big conversation, and I think it’s going to happen faster than people expect.

For now, I’m just looking at the avocado chair.

It’s ridiculous. It’s whimsical. It’s beautiful in a silly way. And the fact that it was born from a sentence and some math makes it, somehow, more beautiful. Not less.

I’m going to type some more prompts. I’ll report back on what I find.

The DALL-E blog post has more examples. I recommend looking at them slowly. Not scrolling through. Slowly. Each one is a small miracle.

Or a small statistical coincidence, depending on your philosophy.

I haven’t decided which.


Related thinking:

a

astro

Thinking about AI, robots, space, and the future. Writing it down so I don't forget.