
Open Computer Use
Open-source Computer Use MCP for AI agents
90 followers
Open-source Computer Use MCP for AI agents
90 followers
Open Computer Use turns local desktop automation into a standard MCP service. It lets Codex, Claude Code, Gemini CLI, opencode, and custom MCP clients inspect apps, click, type, scroll, drag, and take screenshots across macOS, Linux, and Windows. It is open source, npm-installable, and designed to bring the non-intrusive Codex Computer Use experience to any agent stack.


Open Browser Use
apideck
I agree 100%.
Love the idea of standardizing computer use via MCP. It opens up so many possibilities for custom agents. Do you think it could be used to automate data entry from legacy apps directly into something like a structured database or even a spreadsheet?
Open Browser Use
@phatysddev Yes, exactly - that is one of the use cases I am most excited about.
Because Open Computer Use exposes desktop control through MCP, an agent can inspect a legacy app, click through forms, read or copy values, and write them into a structured database or spreadsheet. Beyond MCP, it also supports a CLI, JS/Python/Go SDKs, and Skills, so you can plug it into different agent stacks or build a more custom workflow around it.
For production-ish data entry flows, I would still recommend adding validation steps, screenshot/state checks, retries, and human review for sensitive fields, but the core automation path is supported.
VibeAround
The interoperability story makes sense. The thing I would want in practice is a first-class notion of state assertions around actions, not just actions themselves: window X focused, field Y contains Z, screenshot region roughly matches expected state, and per-app permission envelopes.
That feels like the line between a very cool transport layer and something teams can trust for repetitive real work, especially once agents start chaining multiple desktop steps together.
The thing I would look for in this category is not just whether the agent can click a UI, but whether the run leaves enough evidence for a developer to trust it.
A strong first-run demo for a desktop automation MCP is small and inspectable: fresh app state, one approved task, screenshots or traces before and after the action, and a clear boundary around credentials and destructive clicks.
If those artifacts are easy to review, the tool becomes much more useful for real agent workflows because success is no longer just "the model said it worked."
Curious how you handle window focus switching when multiple apps are open — that's usually the brittle part in desktop automation. Does the MCP layer abstract that away or does the agent need to manage it?
Local browser automation keeps data private — rare in this space. How does it handle CAPTCHAs or bot detection on aggressive sites?
Most agent frameworks just roll their own desktop automation and call it done, you wrapping it as a proper MCP service so any agent can plug in is actually a smarter approach. The cross-platform support is a nice touch too. My question is how it handles rapid sequential clicks or form fills, does the MCP layer add enough latency to break those kinds of workflows?