Launching today

LightCrawl
Lightweight, self-hostable Web scraping API & MCP server
6 followers
Lightweight, self-hostable Web scraping API & MCP server
6 followers
LightCrawl is an ultra-fast, lightweight, and self-hostable web scraping API and Model Context Protocol (MCP) server optimized for LLMs. Features: • Built with TypeScript and Playwright for reliable scraping. • Seamless Brave Search integration for autonomous browsing. • Formats messy web pages into clean, LLM-ready Markdown. • 100% open-source and easy to self-host with Docker & Railway. Turn any website into clean data for your AI agents in seconds!



Mailwarm
Do you support caching so agents don’t keep rescraping the same pages and burning time?
Hi @karimbenkeroum
Thanks for the great question.
Currently, LightCrawl does not support caching out of the box, as it is designed to be completely stateless and minimal.
However, your point is spot on—preventing redundant rescraping is critical for AI agent workflows to save both time and compute resources. Since LightCrawl already supports Redis for its distributed crawling queue, implementing an optional Redis-backed cache layer for scraped Markdown would be a very natural next step.
I've just created a GitHub issue to track this feature request:
https://github.com/yosuke1024/LightCrawl/issues/34
Please feel free to share any specific requirements or thoughts there (e.g., TTL, cache-busting behavior)!