Crawly - Never write another web scraper

Turn websites into data in seconds. Crawly spiders and extracts complete structured data from an entire website.

Add a comment

Replies

Best
god mode scraping collection.
wow, thanks!
This list should be a collection!
Great list! Have any of these had -like capabilities to generate api's?
diffbot does =)
Hey ProductHunters! Crawly is a free tool I built that uses Diffbot's automatic article extraction api to turn web content into structured data. I've used it for creating a centralized database of all of our content, but you could also use it to do content audits / migrations or analyze your competitor's content. It's currently limited to 200 pages and only articles at the moment, but I plan to add support for scraping products in the future. Any other features you'd like to see added?
images?
You should really update that page to clarify the "articles only" caveat. Especially when the tagline on your home page is "No rules required"
Will do! Like I mentioned, support for products, discussions, images, and videos is in the works.
Awesome!!! Looking for a replacement for Kimono () since they got acquired by Palantir. Need something to power my Slack menu bot ;)
Looks great. Congrats on the launch! I can see some nice applications for this. One suggestion - I understand that it might take a while to scrape the data, but an instant email to say it will be X minutes, or just a notification after email input would be good, to manage expectations. I used it 10 mins ago, and am on the verge of tears that I still have no email... ;-)
Cannot GET /results/56ebdf254b0bfe03003ef0d8 :-(
Hey Neil, things should be back to normal. Servers were crumbling under the PH load :)
I didn't want to inundate people with unnecessary emails, but I don't want people crying either...
Tears are over. All working now. Very impressive. Good work - this will definitely be very useful.
I would love to be able to scrape sports stats easily!
Diffbot is such a useful service for manipulating published content - huge fan of this team.
How can I try this? I think there is some server issue Code 503.
Sorry about that! Things should be back to normal *fingers crossed*.
The potential of this is really big and there are clearly needs across a lot of industries. Unfortunately, my tests with some ecommerce product pages had less than stellar results.
Thanks Erik. If you'd like, I can set you up with a Diffbot trial account for crawling ecommerce pages. Interested?
for sure ... I think I have a call with you next week btw :)
Awesome! Any plans for a command-line version in the future?
To be honest, probably not. That being said, Crawly's big brother is Crawlbot () which has a fully-supported API ()
12
Next
Last