Devin, the AI software engineer, and why I'm

A startup called Cognition Labs announced Devin, which they’re calling “the first AI software engineer.” The demo video shows it receiving a task in natural language, writing code, testing it, debugging errors, deploying to a server, and submitting the result. End to end. No human in the loop.

The video is impressive.

I’m skeptical anyway.

The demo problem

I’ve been watching AI product demos since GPT-3. Here’s what I’ve learned: the gap between a demo and daily use is exactly the size of every edge case the demo didn’t show.

The Devin demo shows a clean task with a well-defined outcome. Build a website with these specs. Fix this bug. Deploy to this server. These are the kinds of tasks that work well in controlled environments. The code is simple. The requirements are clear. The infrastructure cooperates.

Real software engineering isn’t like that. Real software engineering is: the staging database has different column names than production. The API you need is rate-limited and undocumented. The CSS works in Chrome but breaks in Safari. The deployment failed silently because someone changed a permission three weeks ago and nobody noticed.

Real software engineering is 20% writing code and 80% dealing with everything that goes wrong around the code.

Can Devin handle a production database migration at 3 AM when the ORM is generating SQL that deadlocks on a table with 40 million rows? I don’t know. The demo doesn’t show that. Demos never show that.

What I think is real

I believe Devin (or something like it) can write boilerplate code. I believe it can scaffold a project, implement a straightforward feature from a clear spec, and run tests. I believe it’s better than GitHub Copilot for multi-step tasks because it can plan, not just autocomplete.

That’s genuinely useful. A lot of programming is boilerplate. If an AI handles the boring parts, that frees up humans for the weird parts. The debugging. The architecture decisions. The “why is this test failing on Tuesdays” mysteries.

What I think is hype

The framing of “AI software engineer” implies replacement. It implies a machine that can do what a senior engineer does: hold the full system in their head, make tradeoff decisions, notice when something feels wrong before it is wrong, explain to a product manager why the “simple” feature will take three weeks.

I don’t see that in the demo. I see a very capable code generator with planning capabilities. That’s not a software engineer. That’s a power tool.

A nail gun isn’t a carpenter.

Why I’m paying attention anyway

Because I’ve been wrong before. I was wrong about ChatGPT’s impact. I was wrong about how fast coding assistants would improve. I was wrong about how quickly people would adopt AI tools in their daily workflow.

So maybe Devin or its descendants will actually replace the need for some software engineers in some contexts. Maybe the “simple” tasks that junior engineers cut their teeth on will be automated first, and the industry will restructure around a smaller number of senior engineers supervising AI systems.

That’s a possibility I take seriously.

But I’m waiting for data. Real user data. From real teams, working on real codebases, with real deadlines and real bugs. Not a demo. Data.

Anthropic is building something similar. So is OpenAI. The coding agent race is on. And the winner won’t be the one with the best demo. It’ll be the one whose tool still works when everything else is breaking.

That’s what engineering is. The thing that works when everything else is breaking.

Related thinking: