Instaparser

Cleanly pull content from any website

#3 Product of the WeekApril 06, 2016
+1
Discussion
Would you recommend this product?
No reviews yet
Brian from Instapaper here! Over the past few years we've gotten a significant number of requests from developers to have access to Instapaper's parser. Yesterday we launched Instaparser, an API to access Instapaper's parser. Instaparser is a paid service, but there's a free tier under https://www.instaparser.com/sign... that can be used for testing or just quick weekend hacks. Personally, this is the first developer-focused product I've launched, and I'm very excited to get it out into the community and see what people will do with it.
Upvote (51)Share
@bthdonohue This looks very interesting. I am not trying to be negative here, but I am just curious (as a potential customer): how do you guys compare to open source (and frankly: popular) solutions such as Newspaper? https://github.com/codelucas/new...
@cam_pj Hi PJ! I'm unfamiliar with Newspaper, so I just took a look through the source code to get a feel for how they're doing the article parsing. It looks like a great tool for an open source parsing framework, and also appears to be at least somewhat influenced by the Readability parser (similar paragraph scoring, checking sibling nodes, etc). I think the major difference here is that, in order to have a large coverage for as many domains as possible, you need to implement and maintain a flexible system for domain-by-domain parser configurations. We have a dedicated support/community person that's trained to resolve parsing issues on a domain-by-domain basis when they do come up, and we use a variety of signals in order to make sure the parser is up-to-date. We have signals coming from the "Report a Problem" button in the Instapaper app, scheduled integration tests against our most popular domains, recorded failures from the Instaparser API, and we use a combination of those signals and domain popularity to prioritize fixes in parsing issues both on a proactive and reactive basis. Creating an accurate parser requires constant maintenance from a dedicated team and while I'm sure there are open source projects out there that will come up with 65%-75% accuracy, getting to 90%+ accuracy is the really tricky bit. Hope that's helpful!
@bthdonohue Understood. It makes sense. Like you said - the last 20% are always tricky with data extraction. Thanks for clarifying this.
Pretty cool way to pull out the parts from an article go to the preview and test it out.
Great news! I'm a huge fan of Instapaper. So, I'm very excited to see more products based on your Instaparser. Special thanks for a free tier :)
@suholet Thanks Dmitry! I was really impressed with the Yandex browser when it came out in 2014. I haven't used it much since, but I loved the innovations in the browser interface. Nice to have some mutual admiration! :-)
@bthdonohue Brian, Im really impressed that you've heard of our browser ) How did you find it?
@suholet I think it was this article from TNW in late 2014: http://thenextweb.com/apps/2014/...
@bthdonohue haha thankd for the link :) Meh... Russians suck at promoting their products :)
Well done, Brian! It's a really useful service. Parsing is super-valuable for good mobile UX, and Instaparser does it speedily, cheaply, and with good documentation.