Launched this week

BrowserAct
Web browser automation for AI agents
1.3K followers
Web browser automation for AI agents
1.3K followers
BrowserAct is built for agents using the web. It gives agents a browser layer for real websites, so they can pass blocked pages, adapt to real scenarios, run multiple tasks safely, and return clean web data for reasoning. Use BrowserAct when an agent needs to browse, click, extract, fill forms, upload files, work inside logged-in sites, handle verification, or run repeatable browser workflows.






BrowserAct
Hey Product Hunt 👋
I'm Wendy, Senior Marketing Operations at BrowserAct.
AI agents work well in clean demos, but the real web is messy: login state, verification, dynamic pages, uploads, blocked flows, and browser sessions that interfere with each other. Most agents stop the moment a website pushes back. So we built a browser layer that doesn't.
BrowserAct reads the messy parts of the web your agent can't handle alone. It's an It's an browser automation CLI that keeps session state, works through common web blocks, hands off to a human when needed, and returns clean web data for reasoning. The idea is simple: agents should automate what they can, ask for help when they're stuck, and continue from the same browser state afterward. You stay in control of all of it; nothing runs without your sign-off.
🎁 For Product Hunt: Get a free 7-day trial to test BrowserAct on a real browser workflow your agent keeps breaking on, no code needed.
Here all day, and would love your honest feedback. What browser task still breaks your agent today?
@wendyba Congrats on the launch. :)
BrowserAct seems like a practical way to help agents navigate the messy web flows. What kind of browser task has been the hardest to automate so far?
@rohanrecommends Thanks! The trickiest workloads typically run on sites with tight anti-bot and verification protections—think logged-in dashboards, marketplaces, social platforms, dynamic search pages, and workflows that combine CAPTCHAs, rate limits, shifting UI elements, plus manual approvals.
@wendyba Great launch. Quick question: When BrowserAct hands off to a human during a stuck flow, what does that experience look like for the person assisting? Specifically, how does the human see the current browser state and what steps are required to resume?
@swati_paliwal Hi,Thanks!
Whenever a workflow requires human input, BrowserAct generates a remote-assist link that clearly states what action needs to be completed.
The penson can open the link and view the exact live browser session and page state. They can finish required steps like logging in, passing verification prompts, scanning QR codes or submitting manual confirmations. Once done, the agent picks back up within the original session.
@wendyba The hand-off to a human and resume from the same state is the part most browser agents skip, and it's exactly where mine breaks. Not blocked pages so much as dynamic reflow: the DOM shifts between read and click, so the agent acts on a stale position. Does BrowserAct re-anchor to elements semantically, or does keeping session state also cover layout drift mid-task?
@david_marko Great point. BrowserAct can pull the latest live page state ahead of every action the agent takes.
Following a manual handoff, the system refreshes its view of the active browser session before resuming work. This avoids outdated DOM snapshots or obsolete element coordinates captured prior to page updates.
In short, handoff preserves your ongoing session, while real-time state retrieval ensures all subsequent actions align with the page’s current live content.
@wendyba This is so cool! Congrats on the launch! So is this like Playwright MCP or Google Chrome headless CLI that I can give to my agents?
@xichiwoo Thank you!
BrowserAct is a browser layer for agents: real Chrome sessions, stealth mode, session persistence, clean extraction, verification handling, multi-session isolation, and human handoff when needed.
To try it, just send this instruction directly to your agent:
`Install this skill for me: https://github.com/browser-act/skills/tree/main/browser-act`
CheckYa
@wendyba I like that you're focusing on reliability rather than another AI wrapper. Handling authentication, CAPTCHAs, and browser state has always been the hard part. This feels like a strong foundation for real-world AI agents.
@monir_ Thank you! Really appreciate it. Would love to hear how it works for your workflows after you try it.
The "most agents stop the moment a website pushes back" framing is the real issue - we've had agent demos fall apart on something as basic as a cookie banner or a verification step. A resilience layer that keeps the agent moving through real-world friction makes a lot of sense.
The right side of this page shows Browser Use and Browserbase as alternatives. Where does BrowserAct specifically pull ahead - is the main angle the "clean output for reasoning" (returning structured data vs raw DOM to the agent) or is it more about the multi-session isolation piece? Genuinely curious what the core bet is here, since that changes a lot about which use cases you're best at.
@galdayan Good question. I don’t see this as a choice between neat output and separate session isolation—we actually need both, and they all tie back to the core idea behind BrowserAct: keeping workflows running reliably on live real sites.
Clean output stops the agent from drowning in messy raw DOM data so it can make better decisions. Session isolation is critical if you’re running multiple parallel tasks, staying logged into different accounts, or handling account-specific work. Things like cookie popups, active login status, captchas, blocked pages, manual confirmations, or short human intervention breaks all feed into one single goal: let the agent keep progressing and wrap up tasks fully.
This is where we think BrowserAct stands out. Instead of the agent dying quietly or having to restart the whole process every time something goes wrong, we built it to hold onto your active browser workflow no matter the interruptions.
A quick contrast to similar tools: Unlike Browserbase, we aren’t just building basic browser backend infrastructure. And versus Browser Use, our focus isn’t solely on simpler browser control. We built BrowserAct as a browser layer purpose-built for AI agents. It bundles isolated sessions, streamlined readable page data, automatic verification handling, and human takeover functionality—all tailored for those messy, unpredictable real-world automation flows.
@galdayan Another big distinction: BrowserAct prioritizes local operation first.
It runs alongside your native Chrome sessions locally. All your logged-in states and sensitive info stay on your device, fully within your control.
You’ll also save a lot on overhead, especially when you don’t have to offload every session to remote hosted browsers.
Browser automation is becoming one of the most important layers for AI agents, because so much business work still happens inside web apps that do not have clean APIs.
The hard part is reliability. I’d be curious how BrowserAct handles brittle UI changes, confirmations, and cases where the agent should stop and ask a human instead of guessing. That judgment layer is what separates a useful browser agent from a risky macro.
@rahulbhavsar With BrowserAct, the agent can inspect the current browser state, take an action, wait for the page to stabilize, and then re-read the latest state before moving to the next step. That helps with dynamic UI changes, layout shifts, and brittle page flows.
For sensitive or uncertain steps, BrowserAct uses confirmation gates and human handoff instead of letting the agent guess. If the workflow reaches login, verification, 2FA, manual approval, or any step that needs a person, `remote-assist` lets a human take over and then allows the agent to resume from the same browser session.
@rahulbhavsar Couldn't agree more very well said this would be interesting to see how it navigates around Brittle UI changed websites, and captcha verifications for example.
The human-in-the-loop sign-off is a brave decision which looks like the right call. Most agent frameworks chase full autonomy and then faceplant the second a site throws a login wall or a verification step. We are building an agent that hops the same booking across regions and honestly DOM reflow is the least of it. The real wall is geo: switch country or currency mid-session and the anti-bot layer reads you as a fresh fingerprint and resets the whole thing. Does BrowserAct's session state survive a locale/IP change mid-flow or does that register as a new session?
@artstavenka1
In BrowserAct’s stealth browser, the browser environment can be aligned with the configured IP region, including things like language and timezone. If IP stability is the concern, using a static IP gives the browser a much more stable environment.
For a persistent/fixed browser setup, if a session is already open and you want changes like IP or region settings to take effect, you need to close the active sessions first. In private mode, a new session can pick up the new environment.
So I’d say BrowserAct helps keep the browser session and environment stable. The safer pattern is one stable browser identity per region.
Does that match your understanding of the issue? If I’m missing something, please correct me.
@carlos_hamilton44 Thanks! That is exactly the problem we designed this for.
BrowserAct handles this through remote-assist: when the agent hits a CAPTCHA, MFA, QR
scan, security-key prompt, or another step that should be handled by a real user, the
CLI pauses automation and returns a short-lived remote assist link for the same live
browser session.
It does not restart the browser or create a fresh login context. The user operates the
existing session, so cookies, local storage, tabs, proxy/profile state, and the current
page context are preserved. While the user is in control, the agent stops sending
browser commands.
After the user clears the blocker, the agent resumes by reading the current page state
again from the live browser page. In practice, we treat the post-handoff page as the
new source of truth rather than trying to replay stale DOM assumptions from before the
challenge. That keeps the continuation clean after redirects, MFA completion, or DOM
changes.
So the core idea is: pause automation, let the human complete the sensitive step in the
same browser session, then re-sync from the live page and continue. Thanks for the
thoughtful question!
The detail that stands out to me is returning clean indexed data instead of raw DOM. On my own scraping agents half the token bill goes to dumping messy HTML into the model, so that part alone is worth a look.
What I'd actually worry about in prod is the CAPTCHA side. Auto-solving Turnstile and DataDome looks great on day one, but those vendors ship updates constantly and solve rates tend to rot fast. How do you keep that holding up over time, and when a job does fall back to a human, who eats that cost on a bulk run?
@kwan_tsui Thanks, that is exactly why BrowserAct returns compact indexed state instead of pushing raw DOM into the model.
On the CAPTCHA side, we don’t treat auto-solving as the only line of defense, because challenge systems change constantly and solve rates can decay. We keep improving the verification handling as different providers update their challenges, but we still avoid positioning it as “solve everything forever.”
BrowserAct uses a layered approach: first the environment layer, with stealth browser identity, session persistence, proxy controls, and static IP support to reduce unnecessary challenges. Then the execution layer, where `solve-captcha` can handle supported verification flows. And if a challenge still needs a person, `remote-assist` lets a human step into the active browser session and the agent resumes afterward.
Triforce Todos
Congrats on the launch!
The human handoff part is cool but how does the agent actually know when to ask for help vs just retrying on its own?
Is that a confidence threshold or something the agent decides itself?
@abod_rehman Great question. It’s not really a single confidence threshold.
The agent reads the live browser state and follows an escalation path. For normal UI changes, it can wait, re-check the page, and adjust the next action. If it detects a common verification challenge, BrowserAct can try automated handling with commands like `solve-captcha`.
But when the step clearly requires human identity or judgment, such as login, 2FA, OAuth, a security check, a QR scan, or manual confirmation, the agent should stop retrying and hand off through headed mode or `remote-assist`.
If the workflow needs a person, BrowserAct keeps the same browser session alive so the human can clear the step and the agent can continue from there.