I'm looking for a tool that can extract headlines, abstracts, authors, etc. I was impressed with kimono labs, but then it got acquired by Palantir.
- Import is slick. Use its WYSWYG editor to select elements of the page/site you want to track and it'll turn it into an API you can use elsewhere.
- Lakshan Perera made this productPage.REST allows you to extract page titles, description, open graph data or any content on page using CSS selectors. You get a JSON response which can be consumed or integrated by many other tools. (Disclaimer: I'm the creator)
- You can still download the "kimonify" chrome extension by following this link, and you can still build CSV tables and output scraped data in JSON format on many sites. The only thing is that you can't schedule routine crawls, or automate pagination. But to get the data from any given page, just click on the <> icon for the "raw data view" and then click either JSON or CSV, and then highlight the data, copy it, and paste it into your Google sheet/Airtable base etc. It's admittedly a clumsy process, but it still saves me a ton of time.