
BrowserAct
Web browser automation for AI agents
1.6K followers
Web browser automation for AI agents
1.6K followers
BrowserAct is built for agents using the web. It gives agents a browser layer for real websites, so they can pass blocked pages, adapt to real scenarios, run multiple tasks safely, and return clean web data for reasoning. Use BrowserAct when an agent needs to browse, click, extract, fill forms, upload files, work inside logged-in sites, handle verification, or run repeatable browser workflows.






David's reflow point is the one that gets me too. Agent reads the page, then by the time it clicks the element already moved, so it acts on a stale position. Re-reading live state before each action sounds like the right fix. Does pulling fresh state every step add enough latency to slow longer tasks, or is it cheap enough to just always do?
@yannikga Good question. Yes, re-reading live browser state adds a bit of latency, but in most real workflows it is much cheaper than acting on stale state and breaking the task.
BrowserAct can re-read the latest browser state before important actions, especially after navigation, layout shifts, form submissions, or human handoff. That helps the agent avoid stale element positions on dynamic pages.
So the tradeoff is intentional: a small amount of latency for much better reliability.
Solid
Excited for this! I've noticed that some CAPTCHAs are getting stricter on datacenter IP addresses, do you solve this for the toughest CAPTCHAs?
@tkeith Great question. To set expectations clearly: BrowserAct does not claim to fully auto-solve every complex CAPTCHA, especially on datacenter IPs, which often trigger stricter anti-bot checks.
We handle verification challenges in three layers.
First is the environment layer: stealth browser profiles, persistent sessions, proxy controls, and static proxy support. This helps keep the browser fingerprint, IP region, language, timezone, and session context consistent across runs.
Second is automated verification handling. If a supported CAPTCHA or verification challenge appears during the task, BrowserAct can use `solve-captcha` to handle it where possible.
Third is human handoff. For tougher challenges that require a real person, `remote-assist` lets a human step into the active browser session, clear the verification, and then the agent resumes from the same state.
The goal is to keep legitimate agent workflows moving through real-world verification friction without losing progress or breaking the session.
selector stability breaks more agent runs than the reasoning does. do you lean on the accessibility tree or visual grounding when the dom shifts?
@sabber_ahamed Great question. BrowserAct does not lean on stale selectors or fixed coordinates as the source of truth.
The core model is live, indexed browser state. The agent calls `state` to get the current interactive elements, acts on the current index, waits for the page to stabilize, and then re-reads state when the page changes.
So when the DOM shifts, the pattern is not “reuse the old selector and hope.” It is “refresh the current page state, get fresh action targets, then act.”
Visual inspection can help in some cases, but the main reliability path is compact state + indexed actions, designed for agents rather than hand-written selectors.
Seems like a much needed tool I'd definitely want to check out. Curious how this interacts with parallel agent sessions that may or may not have overlapping browser needs? Will each agent have their own isolated browser layer, do they share a browser layer, are they able to cross-coordinate across the same browser if needed?
@noice30sugar Thanks! This is one of the core design points in BrowserAct.
BrowserAct separates browser identity from session workspace. If agents need isolation, each agent can run against its own browser identity with separate cookies, fingerprint, proxy, and login state.
If they intentionally need to work under the same account or browser context, they can open multiple sessions on the same browser. Those sessions share login state, but each has its own navigation and task flow, so parallel work does not block the others.
So the model is flexible: separate browsers for isolated agents, multiple sessions on one browser for shared-account workflows, and privacy mode when you want zero residue.
If you had to pick one feature that makes BrowserAct outperform existing browser automation tools for AI agents, what would it be? I'd love to understand the biggest practical difference before giving it a spin.
BrowserAct
@sameerkatel Great question! The biggest practical difference is reliability of the result, not just the action. Most browser automation tools fire off the steps and report "success" the moment the script finishes — even if the page didn't actually load, the data came back empty, or a popup silently blocked the click. BrowserAct verifies each step actually produced the intended outcome before moving on, so your agent doesn't quietly hand you broken data. Once you run a few real workflows you'll feel the difference. Would love to hear what you build with it!
I’ve run into the same brittle-browser-step problem when testing agent flows. The e2e recording angle is interesting — curious how you handle flaky DOM changes after a site redesign?
BrowserAct
@xiaosong001 Yeah, brittle selectors are the whole reason recorded flows rot. We don't lock steps to fixed locators — the agent resolves each step by intent at runtime, so a redesign that moves or renames elements usually doesn't break the run. Big structural overhauls can still need a touch-up, but the everyday "new layout killed my flow" case mostly disappears. Curious what kinds of sites you've seen break the hardest?
A lot of sites explicitly disallow automated access in their ToS, especially the ones with login walls or verification steps. Where does BrowserAct draw the line on which sites it'll automate against, is that left entirely to the user's judgment, or are there categories you won't touch regardless of what the agent's trying to do?
BrowserAct
@ansari_adin Great question — rather than give a blanket answer, we'd encourage testing against your specific use case, since legitimate automation needs vary a lot. We do offer custom/enterprise engagements where we work through requirements and compliance together — if you've got a specific business case, reach out and we'll talk through what's feasible.
@wendyba That answer works for enterprise customers willing to have that conversation, but it sidesteps the actual question for the self-serve tier, where presumably there's no one reviewing what people are automating against before they do it. Is BrowserAct comfortable with that gap existing, or is policing self-serve usage something you're actively thinking about as the user base grows?
@ansari_adin Fair point, and I appreciate you pushing on the self-serve side.
For self-serve, we don’t manually review every workflow before it runs, similar to how AWS or Google Cloud don’t pre-approve every workload customers run on their infrastructure. BrowserAct provides browser automation capabilities, and users are responsible for making sure their use is authorized, lawful, and compliant with the sites they interact with.
Our terms require users to follow applicable laws, target-site terms, robots.txt, and access restrictions, and they prohibit illegal, unauthorized, harmful, or fraudulent use.
So self-serve is not meant to be a blank check. The baseline is legitimate, authorized automation.
If you’d like to discuss a specific use case with the team, you’re very welcome to join our Discord:
https://discord.com/invite/UpnCKd7GaU