
BrowserAct
Web browser automation for AI agents
1.6K followers
Web browser automation for AI agents
1.6K followers
BrowserAct is built for agents using the web. It gives agents a browser layer for real websites, so they can pass blocked pages, adapt to real scenarios, run multiple tasks safely, and return clean web data for reasoning. Use BrowserAct when an agent needs to browse, click, extract, fill forms, upload files, work inside logged-in sites, handle verification, or run repeatable browser workflows.






Triforce Todos
Congrats on the launch!
The human handoff part is cool but how does the agent actually know when to ask for help vs just retrying on its own?
Is that a confidence threshold or something the agent decides itself?
@abod_rehman Great question. It’s not really a single confidence threshold.
The agent reads the live browser state and follows an escalation path. For normal UI changes, it can wait, re-check the page, and adjust the next action. If it detects a common verification challenge, BrowserAct can try automated handling with commands like `solve-captcha`.
But when the step clearly requires human identity or judgment, such as login, 2FA, OAuth, a security check, a QR scan, or manual confirmation, the agent should stop retrying and hand off through headed mode or `remote-assist`.
If the workflow needs a person, BrowserAct keeps the same browser session alive so the human can clear the step and the agent can continue from there.
This looks fantastic, Wendy. The concept of an agent automating what it can, pausing for a human to clear a verification block, and then resuming the exact same session is a game-changer. I'm building an AI proposal tool right now and web data extraction is a constant headache when dealing with dynamic sites. Can't wait to test this out on a few broken workflows!
BrowserAct
@varunvivek Thanks so much, that means a lot. We designed BrowserAct for agents first, with things like anti-detection, better headless mode, remote assist, browser modes, and strong concurrency/isolation. The goal is to make browser tasks keep moving even when the web gets in the way.Would love to hear how it works on your proposal workflows once you test it.
HeyForm
This looks really useful, Wendy.
I like that BrowserAct treats the browser as part of the agent runtime, not just a place to send clicks. In my workflows, the hard part is never the click itself. It’s keeping the task alive when login state, popups, or verification get in the way.
@itsluo Yep, this was our main pain point to tackle.
It’s easy to automate basic clicks in a perfect environment, but actual browser work gets interrupted all the time: logged-in states popping up, verification windows, blocked pages, parts you have to judge manually.
BrowserAct’s built to preserve your ongoing workflow. The agent can pause, recover, or let a human take over without wiping your session and starting over again.
For sites that actively fight automation, like ones with aggressive bot detection, whats the actual success rate looking like right now vs a normal site?
@boyuan_deng1
That’s a great question. We can’t offer a fixed universal success rate, as performance hinges on the target site, account history, traffic patterns, proxy quality, login state and task type.
Our solution is built around three functional layers. The first is our foundational browser environment, with native Chrome sessions, stealth configurations, session persistence and streamlined data extraction.
Second comes automated verification handling. When sites trigger anti-bot measures, BrowserAct detects blocked pages and runs tools like the solve-captcha command to resolve standard CAPTCHA and verification prompts where feasible.
Third is human handoff. If automation can’t clear a barrier, team members can intervene, and the agent resumes work within the original session afterward.
We don’t position BrowserAct as a tool to circumvent all site restrictions. Our core priority is uninterrupted task delivery, with zero loss of existing session progress.
congrats on the launch!
browser automation often breaks in real-world scenarios.love the focus on handling messy web interactions.what was the biggest technical challenge you solved first?
@avery_thompson2 Thank you!
The first big challenge was anti-bot challenges and interactive verification. Real sites don’t just need clicks, they bring CAPTCHA, login checks, 2FA, blocked pages, and changing UI.
So we built the escalation path early: real Chrome sessions, stealth mode, `solve-captcha` for supported challenge flows, and `remote-assist` when human help is needed.
You position BrowserAct as a browser layer rather than browser infrastructure. What capabilities would be difficult for a team to build themselves on top of Browserbase or Playwright?
BrowserAct
@yagnaveena Browserbase + Playwright give you a remote browser; they don't give you the layer that keeps agents reliable when sites change. Specifically:
LLM-in-the-loop recovery when selectors break or DOM shifts mid-flow
Skills — open-source, versioned, composable task definitions (so you're not locked in)
Structured per-step verification (screenshot + DOM diff events) for debuggable agent runs
Managed anti-bot, session persistence, captcha, proxies — one layer instead of five
Native MCP, so any agent gets browser hands without glue code
You can build all of this — most teams we meet spend 6–12 months doing exactly that before deciding the leverage isn't there. If browser automation is your product, build it. If it's a means to an end, we'll give you that time back.
Session persistence is the hard part. If an agent opens a logged-in site, does BrowserAct keep that session alive for follow-up tasks, or re-authenticate every time? That detemines a lot about latency in multi-step workflows.
BrowserAct
@christian_knaut 100% — this is the make-or-break for any agent doing more than one step. Short answer: sessions persist, auth state is reused across tasks. Happy to walk through how it works for your specific workflow if you want — DM me and I'll show you a multi-step example end to end.