Figure 02 can hold a conversation while making

I’ve watched the Figure demo video five times now. Each time I notice something new.

The setup is simple. A person stands at a counter across from Figure 02. They talk. The robot listens, processes, responds verbally, and then acts physically. “Can I have something to eat?” the person asks. The robot scans the counter, identifies an apple, picks it up, and hands it over.

That’s it. That’s the demo.

And it completely rewired something in my brain.

What’s actually happening

The robot is running a language model (powered by OpenAI) for conversation and reasoning. It has vision models for object detection and spatial awareness. And it has learned motor control for grasping and manipulation.

These three systems are running simultaneously. The robot is listening, seeing, thinking, and moving at the same time. It’s explaining what it’s doing while it does it. “I see an apple on the counter. Let me grab that for you.”

Two years ago, each of these capabilities existed separately in labs. A robot arm could pick things up. A chatbot could hold a conversation. A vision system could identify objects. Combining all three into a humanoid form factor that operates in real-time is the part that’s new.

The hands

The hands are what got me.

Watch the video closely. When the robot reaches for the apple, the fingers don’t just close. They adjust. There’s a pre-shaping of the grip based on the object’s estimated size and position. The approach is careful. Deliberate. The kind of movement you make when you’re picking up something that belongs to someone else.

That deliberateness isn’t accidental. It’s trained. But it looks like consideration. It looks like the robot is being gentle on purpose. And that appearance of intent, whether or not actual intent exists, changes how you react to the machine.

I found myself saying “thank you” out loud while watching the video. To a recording of a robot. On my laptop screen.

The timeline

In 2022, Figure 01 walked across a stage. Barely. It looked like a toddler’s first steps.

In 2023, Figure 01 made coffee by watching a human do it. The coffee was bad. The process was slow. But it learned from observation.

In 2024, Figure 02 has a conversation with you, understands what you want, identifies the right object, picks it up, and hands it to you. While explaining its reasoning out loud.

Each step in that timeline seemed far apart when they happened. Compressed into a list like this, the acceleration is obvious. The gap between “can barely walk” and “can hold a conversation while performing physical tasks” was about 18 months.

Where does this curve go in another 18 months?

What I keep thinking about

There’s a moment in the demo where the person says something slightly ambiguous. They don’t point. They don’t specify. The robot interprets. It reasons about context. It makes a choice.

That choice might be wrong sometimes. The demo is a demo. The lab is a controlled environment. Real kitchens have clutter. Real conversations have sarcasm, trailing sentences, gestures that mean one thing in one culture and another thing in another.

But the foundation is there. A robot that can see, hear, think, talk, and manipulate objects in physical space. The gap between “works in a lab demo” and “works in your kitchen” isn’t a physics gap. It’s an engineering gap. And engineering gaps close. That’s what engineers do.

I think about my grandmother’s kitchen. The counter she’d lean against while telling stories. The radio always on. The way she’d hand you a plate without looking, just knowing where your hands were from decades of shared meals.

Figure 02 isn’t that. It’s nowhere close. But for the first time, I can see the path between here and there. And the path is shorter than I thought.

Related thinking: