I m launching webclaw here because I kept hitting the same problem while building agents and RAG workflows: getting a page is easy, but getting clean context from that page is not.
Raw HTML usually brings too much noise: nav, footers, cookie banners, duplicated layout text, scripts, missing JS-rendered content, and inconsistent structure.
webclaw is my attempt at solving that layer: scrape/crawl/map websites and return clean markdown, JSON, structured extraction, summaries, diffs, and MCP/CLI-friendly output.
A shields.io alternative with shadcn/ui design quality. GitHub, npm, and Discord badges with 6 variants, 16 themes, and 5,000+ icons. Free and open source.