Crawlab

Crawlab

Web crawler admin platform for any framework or language

0 followers

Crawlab is a Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java and various web crawler frameworks including Scrapy, Puppeteer, Selenium. It currently has 6k+ stars on Github.
Crawlab gallery image
Crawlab gallery image
Crawlab gallery image
Launch Team
Threedium
Threedium
Image or Text to 3D Model
Promoted

What do you think? …

Marvin.Z
Hi everyone, I am the author of Crawlab, an open source distributed web crawler admin platform. The coolest part is that it can run spider programs written in any languages, regardless of Python, Node.js, Java, etc. It also supports node monitoring, task management, cron jobs, code editing, Git integration and more awesome features, not to mention its beautiful frontend UI. Our goal is to make crawling easier. If you have multiple crawlers to manage, I highly recommend you to check it out.
Marvin.Z
The idea to create this product came from my full-time work. Our company had a web crawling system to scrape news articles from hundreds of websites. However, it is written in .Net Core and very hard to use, given we had a mix of Selenium, general Python scripts, RPAs. We had to deal with compatibility issues with each type of spiders. That's a huge problem. So I came up with this idea to manage every web crawler with a shell command. It is very flexible to do this. Theoretically every programming language can be managed. I spent half a month to finish the first prototype, and it was refactored with Golang to increase its robustness. Now it reached 6.4k stars on Github. I believe it can solve the fundamental issue of managing web crawlers, which are easiness and flexibility. Hope you guys like my product. Thanks! Github: https://github.com/crawlab-team/...