Dru Wynings

Crawly - Never write another web scraper

Turn websites into data in seconds. Crawly spiders and extracts complete structured data from an entire website.

Add a comment

Replies

Best
Blaine Hatab
@datarade god mode scraping collection.
Cesare D. Forelli
@datarade wow, thanks!
Robin Wouters
@datarade This list should be a collection!
Nick Kwan
@datarade Great list! Have any of these had @Kimonify-like capabilities to generate api's? @skrypt
Dru Wynings
@nwkwan diffbot does =)
Dru Wynings
Hey ProductHunters! Crawly is a free tool I built that uses Diffbot's automatic article extraction api to turn web content into structured data. I've used it for creating a centralized database of all of our content, but you could also use it to do content audits / migrations or analyze your competitor's content. It's currently limited to 200 pages and only articles at the moment, but I plan to add support for scraping products in the future. Any other features you'd like to see added?
Luka
@druwynings images?
Erik Dungan
@druwynings You should really update that page to clarify the "articles only" caveat. Especially when the tagline on your home page is "No rules required"
Dru Wynings
@callmeed Will do! Like I mentioned, support for products, discussions, images, and videos is in the works.
Matt Gardner
Awesome!!! Looking for a replacement for Kimono (https://www.kimonolabs.com/) since they got acquired by Palantir. Need something to power my Slack menu bot ;)
Neil Cocker
Looks great. Congrats on the launch! I can see some nice applications for this. One suggestion - I understand that it might take a while to scrape the data, but an instant email to say it will be X minutes, or just a notification after email input would be good, to manage expectations. I used it 10 mins ago, and am on the verge of tears that I still have no email... ;-)
Neil Cocker
Cannot GET /results/56ebdf254b0bfe03003ef0d8 :-(
Dru Wynings
@neilcocker Hey Neil, things should be back to normal. Servers were crumbling under the PH load :)
Dru Wynings
@neilcocker I didn't want to inundate people with unnecessary emails, but I don't want people crying either...
Neil Cocker
@druwynings Tears are over. All working now. Very impressive. Good work - this will definitely be very useful.
Dru Wynings
Eric Iannaccone
I would love to be able to scrape sports stats easily!
Rob Spectre
Diffbot is such a useful service for manipulating published content - huge fan of this team.
Dru Wynings
Yugendhar Devale
How can I try this? I think there is some server issue Code 503.
Dru Wynings
@go_venky Sorry about that! Things should be back to normal *fingers crossed*.
Erik Dungan
The potential of this is really big and there are clearly needs across a lot of industries. Unfortunately, my tests with some ecommerce product pages had less than stellar results.
Dru Wynings
@callmeed Thanks Erik. If you'd like, I can set you up with a Diffbot trial account for crawling ecommerce pages. Interested?
Erik Dungan
@druwynings for sure ... I think I have a call with you next week btw :)
David Rosenberg
Dru Wynings
Sarthak Grover
Awesome! Any plans for a command-line version in the future?
Dru Wynings
@sarthakgrover To be honest, probably not. That being said, Crawly's big brother is Crawlbot (https://www.diffbot.com/products...) which has a fully-supported API (https://www.diffbot.com/dev/docs...)
12
Next
Last