Currently AI browser agents send screenshots to the model. Each screenshot costs thousands of tokens. Over a multi-step task, that means high latency and high API cost.
This package takes a different approach: it renders pages as ASCII wireframes with numbered elements. The agent sees [12]Sign Up instead of a 1280x720 image. Same information, far fewer tokens.
It started as a way to make my own agents cheaper to run. Then I build a package around it. Fully open source and open to feedbacks!
Report
Ascii wireframes are a cool idea maybe a small visual preview option could help too.
Great idea, best of luck with the launch :) looks very 1984 hacker vibe :) I love it and the practical applications seem very real especially for scraping etc
If someone is already using a Playwright-based MCP server (or a screenshot/vision-based computer-use setup), what’s the specific breaking point that typically makes them switch to Agent Browser, and what do they usually have to give up—if anything—in return for the token savings?
@curiouskitty The real breaking point is scale. With existing approaches, a single page can cost around 10k tokens, while Agent Browser typically uses only 1k–3k. And browser agents rarely perform just one action. They usually run multi-step workflows. This means Agent Browser can reduce browser operation costs by up to 70–90%. If a full screenshot is ever needed, the agent can still take one, so switching methods doesn’t mean giving anything up.
Replies
Agent Browser
Ascii wireframes are a cool idea maybe a small visual preview option could help too.
Agent Browser
@reid_anderson3 The library still exposes screenshot tooling, so agent can take a screenshot if needed.
AutonomyAI
Great idea, best of luck with the launch :) looks very 1984 hacker vibe :) I love it and the practical applications seem very real especially for scraping etc
Agent Browser
@lev_kerzhner Thanks! It is also capable of filling inputs, clicking buttons etc. with refs.
Product Hunt
Agent Browser
@curiouskitty The real breaking point is scale. With existing approaches, a single page can cost around 10k tokens, while Agent Browser typically uses only 1k–3k. And browser agents rarely perform just one action. They usually run multi-step workflows. This means Agent Browser can reduce browser operation costs by up to 70–90%. If a full screenshot is ever needed, the agent can still take one, so switching methods doesn’t mean giving anything up.
Trufflow
Where do you see the biggest savings in token usage? Is it for when something is predominantly image heavy?