Launching today

LightCrawl

Launching today

Lightweight, self-hostable Web scraping API & MCP server

6 followers

Lightweight, self-hostable Web scraping API & MCP server

6 followers

Visit website

LightCrawl is an ultra-fast, lightweight, and self-hostable web scraping API and Model Context Protocol (MCP) server optimized for LLMs. Features: • Built with TypeScript and Playwright for reliable scraping. • Seamless Brave Search integration for autonomous browsing. • Formats messy web pages into clean, LLM-ready Markdown. • 100% open-source and easy to self-host with Docker & Railway. Turn any website into clean data for your AI agents in seconds!

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team

Viktor.comThe AI employee that does the work, in Slack & Teams

Promoted

Maker

📌

Hi Product Hunt community! 👋 I'm the creator of LightCrawl. As an engineering manager working deeply with infrastructure, I built LightCrawl because I needed a simpler, faster way to feed clean web data into AI agents without relying on heavy, expensive SaaS scraping platforms. LightCrawl is a lightweight, 100% open-source alternative built with TypeScript and Playwright. It converts messy web pages into perfectly formatted, LLM-ready Markdown in seconds and includes full Model Context Protocol (MCP) server support right out of the box, making it seamless to connect with tools like Cursor or Claude Code. It's designed to be completely stateless, self-hostable, and secure. You can spin it up instantly via Docker or deploy it to Railway with a single click. I'd love to hear your thoughts, feedback, or feature requests!

Report

8h ago

Mailwarm

Do you support caching so agents don’t keep rescraping the same pages and burning time?

Report

23m ago

Maker

Hi @karimbenkeroum

Thanks for the great question.

Currently, LightCrawl does not support caching out of the box, as it is designed to be completely stateless and minimal.

However, your point is spot on—preventing redundant rescraping is critical for AI agent workflows to save both time and compute resources. Since LightCrawl already supports Redis for its distributed crawling queue, implementing an optional Redis-backed cache layer for scraped Markdown would be a very natural next step.

I've just created a GitHub issue to track this feature request:

https://github.com/yosuke1024/LightCrawl/issues/34

Please feel free to share any specific requirements or thoughts there (e.g., TTL, cache-busting behavior)!

Report

11m ago

Hi @karimbenkeroum

Thanks for the great question.

Currently, LightCrawl does not support caching out of the box, as it is designed to be completely stateless and minimal.

I've just created a GitHub issue to track this feature request:

https://github.com/yosuke1024/LightCrawl/issues/34

Please feel free to share any specific requirements or thoughts there (e.g., TTL, cache-busting behavior)!