Kadoa uses AI to explore, extract, and transform web data. Save hours of time setting up and creating web scrapers. Extract the data you need effortlessly with Kadoa.
Hi PH! 👋
We got frustrated with the time and effort required to code and maintain custom web scrapers, so we built an LLM-based solution that can extract data from any website in the format you want. AI should automate tedious and un-creative work, and web scraping definitely fits this description.
We're leveraging large language models to semantically understand websites and generate the DOM selectors for them. Using GPT for every data extraction, as most comparable tools do, would be way too expensive and very slow, but using LLMs to generate the scraper code and subsequently adapt it to website modifications is highly efficient.
Try it out for free on our playground https://kadoa.com/playground and let us know what you think! And please don't bankrupt us :)
Here are a few examples:
- Product Listings (Specialized Bikes) https://www.kadoa.com/playground...
- Financial Data (Yahoo Finance) https://www.kadoa.com/playground...
- Player Stats (LeagueOfGraphs) https://www.kadoa.com/playground...
🛠️ How it works 🛠️ (the playground uses a simplified version of this):
- Loading the website: automatically decide what kind of proxy and browser we need
- Analysing network calls: Try to find the desired data in the network calls
- Preprocessing the DOM: remove all unnecessary elements, compress it into a structure that GPT can understand
- Slicing: Slice the DOM into multiple chunks while still keeping the overall context
- Selector extraction: Use GPT (or Flan-T5) to find the desired information with the corresponding selectors
- Data extraction in the desired format
- Validation: Hallucination checks and verification that the data is actually on the website and in the right format
- Data transformation: Clean and map the data (e.g. if we need to aggregate data from multiple sources into the same format). LLMs are great at this task too
The vision is a fully autonomous, cost-efficient, and reliable web scraper :)
Report
Really cool! It was only a matter of time before we got here. Can it perform actions like logins or element clicks to access gated data, yet? that's the extra golden ticket!
@derekwilliamson We support behind-login scraping at a small scale. It's important to know the terms and conditions of services you scrape behind a login.
Report
really sad that we could not test it !
whats the idea of sharing the product if its not ready for testing yet ?
you could maybe add an alpha test with code for product hut users .
@eonpilot Yep. We use rotating proxies for high-frequency scraping.
Report
Looks like such a useful tool! Especially to solve the long-tail issues of web scraping. I can imagine multiple use cases coming up with API access to this tool
This is really cool, I just conducted a particular scraping job that I was struggling with, and it went on without a hitch. Yet to try on other datasets since I was taking it for a trial, but this should simplify my workflow by a lot. Thanks and congrats on the launch
Kadoa
Kadoa
Kadoa
DiffSense
Kadoa
Kadoa
The Simple Finance Tracker
Lens