Crawlee

Crawlee

Crawlee helps you build reliable crawlers. Fast.

4.5
6 reviews

560 followers

Crawlee is an intuitive, customizable open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked with auto-generated human-like fingerprints, headless browsers, and smart proxy rotation.
This is the 3rd launch from Crawlee. View more

Crawlee for Python v1.0

Crawlee for Python is now out of beta with new features!
After months of development, polishing, and community feedback, Crawlee for Python is leaving beta and entering a production/stable development status. We are happy to announce Crawlee for Python v1.0.0. Check out all the new features and updates now!
Crawlee for Python v1.0 gallery image
Free
Launch Team
AppSignal
AppSignal
Get the APM insights you need without enterprise price tags.
Promoted

What do you think? …

Saurav Jain

Hello PH community,

I am Saurav, Senior Developer Community Manager of Apify, the company building Crawlee.

I am happy to hunt Crawlee for Python v1.0 today. 🚀

We launched the beta version in July 2024, and got an amazing response from the Python community.

With more than 6000 stargazers, thousands of users, and a lot of feedback, we are ready to roll out Crawlee for Python v1.0

It has all of these features:

- Unified storage client system: less duplication, better extensibility, and a cleaner developer experience. It also opens the door for the community to build and share their own storage client implementations.
- Adaptive Playwright crawler: makes your crawls faster and cheaper, while still allowing you to reliably handle complex, dynamic websites. In practice, you get the best of both worlds: speed on simple pages and robustness on modern, JavaScript-heavy sites.
- New default HTTP client (`ImpitHttpClient`, powered by the Impit library): fewer false positives, more resilient crawls, and less need for complicated workarounds. Impit is also developed as an open-source project by Apify, so you can dive into the internals or contribute improvements yourself: you can also create your own instance, configure it to your needs (e.g. enable HTTP/3 or choose a specific browser profile), and pass it into your crawler.
- Sitemap request loader: easier to start large-scale crawls where sitemaps already provide full coverage of the site
- Robots exclusion standard: not only helps you build ethical crawlers, but can also save time and bandwidth by skipping disallowed or irrelevant pages
- Fingerprinting: each crawler run looks like a real browser on a real device. Using fingerprinting in Crawlee is straightforward: create a fingerprint generator with your desired options and pass it to the crawler.
- Open telemetry: monitor real-time dashboards or analyze traces to understand crawler performance. easier to integrate Crawlee into existing monitoring pipelines

If you want to learn more, you can go ahead and read our launch blog.

We would love to hear your feedback and your thoughts on the new version! Feel free to comment your thoughts here and support us for the launch! Cheers :)

Okay, real talk, I’ve been scraping stuff in Python for years, using Scrapy, plain requests with/without realistic TLS fingerprints, Selenium, Playwright… basically everything you can think of. And somehow, no matter what, things always get messy when your project grows. But Crawlee for Python… wow. They finally hit that sweet spot. It’s high-level enough that retries, concurrency, and persistent storage just work, but it’s not so opinionated that you feel locked in. And now that it’s officially out of beta 🎉, I actually had fun getting a crawler running. Like, I could feel myself smiling while setting it up lol! Good job 👏

Shashwat Ghosh

@sauain @jancurn @mnmkng Congratulations, this is awesome...can't wait to get my hands dirty and experience it to improve my apify actors further.

Saurav Jain

thanks for the support! looking forward to hear your feedback! @shashwat_ghosh_gtm 

Shivay Lamba

Amazing launch by the team. I have used crawlee for a while after coming across it from Saurav. It's powerful and extremely reliable. And the development experience using it in python is impressive

Saurav Jain

@shivaylamba  thanks for the support! :)