Anthropic's computer use feature: Claude can
Anthropic released a feature called computer use and I think it’s more important than most people realize.
Claude can see your screen. Not “read your code.” Not “access your files through an API.” See. Your. Screen. The pixels. The layout. The buttons. The rendered output of whatever you’re looking at. And it can click, type, scroll, and interact with your computer the way a human sitting at your desk would.
I watched it debug a CSS issue by looking at the rendered page, noticing the spacing was wrong, inspecting the element, identifying the conflicting rule, and fixing it. Through the screen. Like a colleague looking over your shoulder.
That’s different from anything AI has done before.
How it works
Previous AI coding assistants worked through text. You paste code, the AI reads the text, generates text back. The interaction is entirely in the domain of code-as-text.
Computer use operates through screenshots and mouse/keyboard actions. The AI sees what you see. It can read text on the screen but also interpret layout, colors, positioning, error messages in dialogs, and visual state (is this button disabled? is this dropdown open? is this page loading?).
It’s the difference between asking a remote colleague to fix something by describing it over the phone versus having them sit at your desk and see the problem themselves.
What I tested
I let Claude navigate to a website I was building, fill out a form, and submit it. It found a form field that wasn’t validating correctly. It looked at the error message, switched to the code editor, found the validation logic, fixed the bug, saved the file, refreshed the page, and verified the fix worked.
The whole process took about 90 seconds. A human would have done it in about the same time. But the human would be me, and I was watching the AI do it instead.
I also let it set up a development environment. Install packages, configure files, run commands. It read terminal output the way I would, noticed when something failed, and adapted. When a package installation threw an error about a missing dependency, it read the error message from the terminal (visually, from the screen) and installed the missing dependency.
Why this is different
Every previous AI tool was an abstraction layer. You gave it text, it gave back text. The AI operated in a symbolic space: code, language, structured data. The real world of screens, buttons, and visual interfaces was invisible to it.
Computer use removes that abstraction. The AI operates at the same level you do: the screen. It sees what a user sees. It clicks what a user clicks. This means it can interact with any application, any website, any tool that has a visual interface, without needing an API or integration.
Can’t get API access to a tool? The AI can use it through the screen, the same way you would. Need to interact with a legacy system that only has a web interface? The AI can see and use it.
The implications for automation are significant. The long tail of tasks that can’t be automated because they require interacting with arbitrary software through a visual interface just became automatable. Not perfectly. Not reliably enough for critical operations. But the boundary moved.
The trust question
I noticed something about my own reaction. When Claude was navigating through code in a text editor, I was comfortable. When it started clicking buttons on websites and filling out forms, I got nervous.
Same AI. Same capability level. But something about watching it interact with the visual world, the world I interact with, triggered a different response. It felt more real. More present. More like something that could make a mistake in a place where mistakes are visible and consequential.
I think this reveals something about how we think about AI boundaries. We’re comfortable with AI processing text because text feels abstract. When AI moves into the visual, physical-interaction layer, it feels like it’s crossed into our space. The screen is our interface to the digital world, and sharing it feels intimate in a way that sharing a text prompt doesn’t.
That feeling isn’t rational. But it’s real. And it’ll matter as these tools become more capable.
An AI that can see your screen and act on it isn’t just a better coding assistant. It’s a fundamentally different kind of tool. The kind that raises questions about agency, trust, and what it means when your machine can use other machines on your behalf.
I let Claude see my screen. It fixed my CSS. I’m still processing what that means beyond the CSS.
Related thinking:
astro
Thinking about AI, robots, space, and the future. Writing it down so I don't forget.