ScrapeOwl is a simple and powerful web scraping API that manages proxies, headless browsers, and HTML parsing. We develop the best tooling for you to get what you need by removing the complexity. Simply specify the website and the element you want.
Hi Folks 🙌
Thank you @musharofchy for hunting 🙏
I am super excited to share our first ever product with you all 🎉🥳😍
Web scraping is difficult and scaling a scraper is hard. So We are excited to announce https://scrapeowl.com
Simple and powerful scraping API with:
JavaScript Rendering
Proxies
Headless chrome
Extracting data from the page
ScrapeOwl turns a web page into formatted JSON with its powerful HTML parser that supports both the CSS selector and XPath.
The goal is to make scraping as easy as using a simple API.
Thanks,
Athar
😍 Anything for Product Hunters?
Offering 30% Recurring lifetime discount on any plan. Apply coupon code: PHSPECIAL during checkout.
Congrats on the launch.
Heads up that the "Try the Extraction API" does not seem to work. I added https://aboutsnack.com, it loads for a second, and nothing happens next. I tried a few other URLs, so it does not even seem to be URL specific.
@kuizinas Thanks for trying it out, that's rather strange — I just tested it with a few URLs and it works fine. Here is the example response from the URL you mentioned.
The demo box on the homepage takes up to 1 minute before it can show the details depending on the time it takes to extract the details from the page.
Let me know if you need to test the demo with actual elements.
Thanks
@nizamaniathar Oh I see what's going on. I simply typed "aboutsnack.com". I looked at the API response and that fails with "The url format is invalid."
@kuizinas Thanks for pointing that you, it is a minor oversight from my part on site's demo. I'm fixing it right now and making it always ask for URL with prefix.
The validator requires the URL to begin with either "http://" or "https://"
Thanks!
@georgewillaman All of the solutions that you mentioned are like simple API's which return raw HTML.
We return parsed data in JSON format organized by element (h1, p, etc.), we also return HTML if a user wants to parse the HTML on their own.
Report
@georgewillaman@nizamaniathar Hi athar will take a look at scapeowl, I am not sure returning the data as json like that is useful as it means someone needs to build a page parser over json when it is much easier to use xpath or css on the html. Interested to know the use case for this and how it would be useful over plain html?
@georgewillaman@pgordon Hi Paul,
ScrapeOwl does not return the JSON by default. What happens is you send the selectors (be it CSS selector or XPath) and it scrapes and then extracts those selectors that you sent along with the request and it returns them in a JSON so you do not need to parse the HTML yourself.
But if you want to do the parsing yourself then you can leave them empty and it will return full HTML of the page.
You can also have a look at our docs to better understand this https://scrapeowl.com/docs
I hope that helps.
You can reach out to me via live chat on our website if you need help setting up a demo.
Thanks
Hi @valentine_erokhin
Parsehub is a no-code tool that requires you to setup templates with the elements you need for them to be extracted.
With ScrapeOwl is API where you simply provide the URL along with a list of elements (h1, p, etc.) and it returns the data.
We also take care of proxies, Headless chrome apart from extracting the data from the page.
Report
Certainly are plenty of scraping products out there. The question is, how do you plan to keep websites from getting upset with you doing so? Some, like Facebook, frequently file lawsuits against those caught scraping their site.
https://techcrunch.com/2020/10/0...
Report
Interesting that there are so many almost identical services launching like this. I already use scrapingbee, scraperapi and others.
I am not complaining as it is good for me as I use more than one of these as backup and competition is good. Will checkout scrapeowl (though am a bit concerned I am just using the same underlying services with a different name).
Are all these services built on another lower level API or service?
Report
I tried that out and it worked great overall! Gonna send some ideas and feedback to you as I play with it a bit further.
Lineicons
Contra
Lineicons
Contra
Lineicons
Lineicons
Lineicons
Entrepreneur OS
Lineicons
Lineicons
Lineicons