I joined Tabstack four weeks ago. The fastest way I know to understand a product is to build something real with it not tutorials, not toy examples, but an actual app that uses the API under real conditions and breaks in interesting ways.
So I built Rival. Open-source competitive intelligence dashboard that tracks competitor pricing, changelogs, careers, docs, and GitHub signals, diffs what changes, and generates intelligence briefs automatically.
Tabstack by Mozilla
@Tabstack by Mozilla keeps cooking.
Today, the team is launching not 1, not 2, not 3, but 4 new features. Introducing:
Tabstack CLI for quick automation and scripting - view source code on GitHub
Tabstack MCP server that gives your AI assistant direct access to Tabstack - read docs
Tabstack agent skill for your @OpenClaw or Hermes agent
Tabstack for Raycast - an extension for scraping data without leaving @Raycast
Go scrape something today.
@fmerian Converting data scrapped from website to a schema is an universal problem. I will surly give a try.
Tabstack by Mozilla
@gokuljd yes! and with this launch, we hope @Tabstack by Mozilla perfectly fits with your existing dev workflow.
Looking forward to seeing what you're building with it.
Tabstack by Mozilla
@gokuljd btw @tessak22 is going live later today at 12 PM SF time to walk through these new features. Tune in!
The unified API abstraction on top of scraping is clever. We've hit the selector-maintenance problem building data pipelines where a single HTML change breaks weeks of work. Does it use headless browser pooling or something more lightweight for dynamic content, and how do you handle rate-limiting per domain when multiple callers share the same API key?
Tabstack by Mozilla
@retain_dev Excellent questions!
For content extraction we use several different strategies including headless browsers. However, not every site needs a full headless browser as you alluded to. Sometimes a simple HTTP request will do the trick. Tabstack aims to pick the most efficient strategy based on the requested URL. Extraction effort is also configurable, you can read more about it here: https://tabstack.ai/blog/fetch-effort-parameter
To prevent multiple callers from hammering the site over and over we use caching and honor robots.txt directives that target our user agent.
Tabstack by Mozilla
Curious: Is it something you experimented yourself with another product? Would love to have your feedback about the first-time experience using @Tabstack by Mozilla - start here: tabstack.ai
One API call for structured JSON, markdown, and browser automation is a solid combo. Does the schema validation handle edge cases well when a site's layout changes, or does it need manual updates?
Tabstack by Mozilla
@doganakbulut Great question. The trick is you're not writing selectors. You define a JSON schema for the fields you want, and every call re-reads the page and maps it to that schema. So when a site ships a redesign, there's nothing to patch. No selectors to go stale.
Couple things to know: if a field straight up isn't on the page anymore, you get null back for it instead of the whole call blowing up. And for heavy JS pages, set effort to 'max' so it fully renders first.
So you maintain the schema, not the scraping logic. And the schema only changes when the data you want changes.
Tabstack by Mozilla
@doganakbulut Thanks for your support, and great question. Random idea here: a tool to get the JSON Schema for any URL. @tessak22 wdyt?
Very interesting approach. Most web extraction tools eventually struggle when sites change their structure. How does Tabstack handle schema reliability over time without developers constantly updating extraction rules? Is there a point where human intervention is still required, or is the adaptation fully automated?
Tabstack by Mozilla
@janani_2001 Schema-based, not selector-based. You define the fields you want and a short description of each, and the model maps page content by meaning instead of position. So when a site reshuffles its DOM but still shows the same info, nothing on your end changes.
Human intervention comes in when what you want changes, not when the page does. New field, you add it to the schema. And if a page stops carrying something, extract.json returns null for that field instead of failing, so you catch it instead of getting silently wrong data.
So layout churn is handled for you. Deciding what to pull is still yours.
Tabstack by Mozilla
You're spot on. Is it something you experimented yourself with another product? Would love to have your feedback about the first-time experience using @Tabstack by Mozilla - get started here: tabstack.ai
@tessak22 @fmerian Thanks for the detailed explanation! The schema-based approach makes a lot of sense, especially compared to brittle selector-based extraction.
I've seen similar challenges in test automation, where UI changes can break scripts even when the underlying user workflow hasn't changed. It's interesting to see the same problem being solved from a data extraction perspective.
I'm curious, how do you evaluate extraction quality over time? Do you have any automated validation or confidence scoring to detect when the model might be returning plausible but incorrect data?
The schema-first approach is interesting. Have you found that users spend more time defining the schema they want, or cleaning up the extracted data afterward? Curious where the bottleneck usually ends up.
Tabstack by Mozilla
@surabhi_minocha Great question. Schema-first moves the bottleneck to the front, and shrinks it.
With most extraction, the work lives on the back end. You get messy output and clean it, every run, forever. Schema-first flips that. You define the shape once, and Tabstack does the cleanup for you. The model reads the raw page, normalizes the values, and returns data that's already typed and in your desired shape. No separate cleanup pass on your side.
So the time does go into defining the schema. But that's a one-time decision, not a per-run tax. And it's mostly just deciding what you actually want out of the page, which is the part you wanted to think about anyway.
If you keep an eye out, you'll see our launch next week that's focused on schemas. 😉
Tabstack by Mozilla
@tessak22 spoiler alert 🙈
"URL + schema in, clean JSON out" is exactly what I keep wishing for. I drive a lot of browser automation for my own agents and the thing that always bites me isn't the first run, it's the site quietly changing its DOM a week later and everything breaking silently. Does the schema-based extraction hold up when a page's layout changes, or does it need re-tuning? Mozilla-backed and "never trained on your data" is a strong trust angle too. Congrats on the launch.
Tabstack by Mozilla
@david_marko Thank you, that means a lot. This is exactly the problem we built around.
It holds up without re-tuning. Nothing's tied to the DOM, so when a site changes its markup, extraction keeps working. The model finds your fields by what they mean, not where they sit on the page.
And nothing breaks silently. If the data ever leaves the page, the field comes back null, so you see the gap instead of shipping bad data.
Trust matters to us as much as to you. That's the Mozilla manifesto. 🫶
Tabstack by Mozilla
Privacy, transparency, and control. You can read the Mozilla manifesto here: https://www.mozilla.org/about/manifesto/
For launch/community ops, I would use the MCP server to turn docs, changelog pages, and competitor pages into a weekly pre-release brief. The edge case I would test first is source freshness. If a teammate asks the agent to re-check one URL after a cached extraction, can Tabstack force a fresh read for that source while still using cache for the rest?
Tabstack by Mozilla
@hazy0 Love this use case, and you picked exactly the right edge case to poke at.
Yes, you can. The nocache flag is per-request, not a global mode. A brief like that is really a fan-out of individual extract calls, one per URL, so cache is decided per source. When your teammate wants to re-verify a single page, set nocache: true on that one call and leave it off the others. That source gets a fresh read while the rest of the brief still answers from cache, so you never have to bust the whole run to re-check one link.
If you want everything fresh later, setting nocache: true on every call does that too, but for your scenario the per-source control is exactly what you're after.
That per-source nocache model is exactly what I was hoping for. For a weekly brief, I’d probably mark changelog and pricing pages fresh while leaving older docs cached, so the fan-out shape makes sense. Thanks for clarifying the boundary.
Tabstack by Mozilla
@hazy0 awesome! give it a spin and let us know what you think. you can leave a review here: https://www.producthunt.com/products/tabstack/reviews/new
Tabstack by Mozilla
@hazy0 follow-up question: what should @Tabstack by Mozilla launch next from your perspective? take the poll here: https://www.producthunt.com/p/tabstack/what-should-tabstack-launch-next