Desktop AI agent using screen reader APIs, not screenshots

Clawd Cursor - Desktop AI agent using screen reader APIs, not screenshots

by•4mo ago

Most AI desktop agents screenshot your screen and send it to a vision model for every action. Clawd Cursor takes a different approach — it uses screen reader accessibility APIs first, falling back to vision only when needed. The result: 80% of tasks need zero LLM calls. It's 6x faster and 30x cheaper than screenshot-based agents. Built with TypeScript, it connects via VNC and uses a smart action router that tries accessibility APIs, then task decomposition, then AI vision as a last resort.

Replies

Best

Maker

📌

Hey everyone! I built Clawd Cursor because I was frustrated with how slow and expensive existing AI desktop agents are. They screenshot everything and send it to GPT-4V for every single click. My approach: use screen reader accessibility APIs first. The OS already knows what's on screen — button names, text fields, menu items. Why ask an AI to figure that out from pixels? The result: 80% of desktop tasks (clicking buttons, filling forms, navigating menus) need zero LLM calls. It only falls back to AI vision for complex visual tasks. It's open source, built with TypeScript, and connects to any desktop via VNC. Would love your feedback!

Report

4mo ago