
BrowserAct
Web browser automation for AI agents
1.6K followers
Web browser automation for AI agents
1.6K followers
BrowserAct is built for agents using the web. It gives agents a browser layer for real websites, so they can pass blocked pages, adapt to real scenarios, run multiple tasks safely, and return clean web data for reasoning. Use BrowserAct when an agent needs to browse, click, extract, fill forms, upload files, work inside logged-in sites, handle verification, or run repeatable browser workflows.






Congrats on the launch.
I currently use agent-browser, which is also OSS and agent first.
Why should I think about switching to this?
BrowserAct
@somangshu While the fundamental capabilities are quite similar, BrowserAct is significantly more powerful when it comes to bypassing access restrictions. Here is a quick comparison:
BrowserAct is one of those things I kept wishing existed, so happy to see it. The part that grabs me is letting agents actually drive a real browser instead of fighting with brittle scrapers or half-baked APIs. Curious about reliability though, when a site changes its layout, does the agent recover on its own, or do you end up babysitting the flows? Either way, nice work.
@yibo_wang3 Great question.
With BrowserAt the agent can call `state` to read the current interactive elements, act on the current indexes, wait for the page to stabilize, and then re-run `state` to get fresh targets if the DOM changes. The `state` output also marks added or changed elements, which helps the agent focus on what shifted.
So for flaky DOM changes, the pattern is fresh browser state + indexed actions, not replaying stale selectors. If a full redesign changes the actual workflow logic, the agent may still need to re-explore the page
The stale-position issue David raised is the thing I hit most. I do browser automation and every element ref I grab dies the moment the page navigates. I just re-find everything before each click, but it adds a round trip every time. Does BrowserAct batch that re-anchoring under the hood, or does the agent still need to request a fresh page state manually?
@yannikga When the page changes, the agent can call state to get the latest page state, then
choose and click elements based on that fresh state. BrowserAct does not silently
remap old element refs across navigations, because that can cause incorrect clicks
on dynamic pages.
hey! Browser automation for agents is genuinely exciting, the fact that it handles blocked pages and session isolation means agents can finally work on the real web, not just clean APIs.
One thing I'm curious about though, how does BrowserAct handle sites that fingerprint browser behaviour to detect bots? Because the hardest part of real-web automation isn't blocked pages, it's sites that let you in but silently serve you degraded or misleading data once they detect non-human patterns.
Is there any layer that makes the agent's browsing behaviour look more human at the request level?
@priyatharshini_c Great question. We see this as a browser identity and environment problem, not just a blocked-page problem.
BrowserAct uses a stealth browser layer with fingerprint spoofing, navigator patching, TLS fingerprint alignment, headless concealment, proxy options, and privacy or fixed-identity modes.
For sensitive workflows, the safer pattern is a stable identity: consistent fingerprint, static IP, cookies, and session state. For login-free batch work, privacy mode can use a fresh fingerprint and clean profile.
@double_chenThe stealth layer is way more thorough than I expected. Three-layer isolation across fingerprint, network and session is basically a full identity stack per run.
But the thing that actually got me was skill-forge. An agent that explores a site, discovers the DOM patterns, packages a reusable Skill, and then never has to rewrite that automation again, that's the real unlock isn't it? Most teams are stuck in a loop of "scrape -> site changes -> rewrite." Forge breaks that loop completely.
Is forge stable enough for production workflows yet or still experimental?
@priyatharshini_c Thank you, and yes, that’s exactly the idea behind Skill Forge.
Skill Forge explores the site first, generates a reusable Skill, and then runs self-tests so the workflow is not just a loose recording.
We’re already using it daily to produce real production workflows. In fact, many of the Skills you see in the GitHub solutions were generated directly through this tool.
We’d love for you to try Skill Forge and tell us where it works well, where it breaks, and what we should improve next.
@double_chen Love that it's already running in production and not just a demo. That's the real test.
I'll definitely try Skill Forge and come back with honest feedback.
The scrape -> site changes -> rewrite loop is something I've hit personally so I'm genuinely curious how well it holds up on messier sites.
Excited to break it 😄
The human handoff is the piece I’d pressure-test. If an agent pauses for verification and resumes later, I’d want a small receipt for the URL/state, action intent, what the human changed, and what the agent is allowed to do after resume.
Otherwise the handoff becomes invisible state.
@blah_mad Completely agree, and thanks for the thoughtful suggestion.
Human handoff should not become invisible state. Today, when BrowserAct generates a `remote-assist` link, it includes the handoff objective, so the person knows what action is needed, such as login, verification, or manual confirmation. After that, the agent resumes in the same browser session.
Your “handoff receipt” idea is a great direction for us. Making the URL/state, intent, human action, and post-resume permissions more explicit would make handoff easier to audit and trust, especially for team workflows. We’ll take this into our roadmap.
Does that line up with what you meant? If not, I’d love to understand where the gap is.
@double_chen Yes, that lines up. The gap I’d pressure-test is the moment after resume: the agent should not continue just because the session is back. It should know what changed during handoff and which next actions are still allowed. That is the receipt I’d want for team workflows.
finally. temporarily handing the session back to a human when the agent gets stuck instead of just being confused until failure is such a simple yet important fix. congrats on the launch, Wendy!
@wilder_dev Thank you!
This is exactly the pain point we set out to address. Real-world website workflows inevitably hit steps that require human intervention.With BrowserAct, the automation can pause, pass the intact live session to a user, and resume right where it stopped—without discarding any existing progress.
The handoff-to-human piece is what makes this actually usable in production. Most browser automation breaks the moment it hits a CAPTCHA or an unusual login flow, and then you're just stuck. Letting the agent pass control back cleanly and pick up from the same state is genuinely different from what's out there. Congrats on the #1 yesterday.
@schott_taylor Thank you! Really appreciate that.
That was exactly the gap we wanted to close. Some browser steps still need a person, especially around CAPTCHA, login, 2FA, or unusual verification flows.
With `remote-assist`, the agent can pause, hand off the active browser session, and then continue from the same state instead of restarting the workflow.
We’d love to hear your feedback after you try it in your own workflows.