Browser Agents that communicate using ASCII wireframes

Start new thread

Agent Browser - Browser Agents that communicate using ASCII wireframes

Agent Browser

•3mo ago

Stop wasting tokens on screenshots. Agent Browser helps AI agents browse the web using wireframe snapshots rather than screenshots or DOM dumps.

Replies

Best

Agent Browser

Maker

📌

Currently AI browser agents send screenshots to the model. Each screenshot costs thousands of tokens. Over a multi-step task, that means high latency and high API cost. This package takes a different approach: it renders pages as ASCII wireframes with numbered elements. The agent sees [12]Sign Up instead of a 1280x720 image. Same information, far fewer tokens. It started as a way to make my own agents cheaper to run. Then I build a package around it. Fully open source and open to feedbacks!

Report

3mo ago

Smart approach — ASCII wireframes remind me of how screen readers parse UI, just optimized for token efficiency instead of accessibility. In my multi-agent OCR pipelines, visual input costs always bottlenecked the workflow. Curious whether the wireframe representation handles dynamic elements like modals or lazy-loaded content — that's usually where DOM-based approaches fail too.

Report

30d ago

AutonomyAI

Great idea, best of luck with the launch :) looks very 1984 hacker vibe :) I love it and the practical applications seem very real especially for scraping etc

Report

3mo ago

Agent Browser

Maker

@lev_kerzhner Thanks! It is also capable of filling inputs, clicking buttons etc. with refs.

Report

3mo ago

Product Hunt

If someone is already using a Playwright-based MCP server (or a screenshot/vision-based computer-use setup), what’s the specific breaking point that typically makes them switch to Agent Browser, and what do they usually have to give up—if anything—in return for the token savings?

Report

3mo ago

Agent Browser

Maker

@curiouskitty The real breaking point is scale. With existing approaches, a single page can cost around 10k tokens, while Agent Browser typically uses only 1k–3k. And browser agents rarely perform just one action. They usually run multi-step workflows. This means Agent Browser can reduce browser operation costs by up to 70–90%. If a full screenshot is ever needed, the agent can still take one, so switching methods doesn’t mean giving anything up.

Report

3mo ago

Trufflow

Where do you see the biggest savings in token usage? Is it for when something is predominantly image heavy?

Report

3mo ago

Agent Browser

Maker

@lienchueh When you are building a browser agent, you have two options. Either you need to use screenshots at every step or to use accessibility snapshots. To be able to use screenshots, you need to use a vision model with computer use capabilities (which is costly). So most people started using accessibility snapshots. But accessibility snapshots dumps much more data then needed for agent to work on the page. Agent Browser takes a different path and builds a wireframe from the visible elements in page which saves 70-90% tokens depending on the page. So the savings apply to each page that the agent uses.

Report

3mo ago