Advice Needed - Web Scraping

Paul Woodthorpe
5 replies
We have a lot of app developers on here so I am hoping to bend an ear on a solution. Is it possible for a web scraper to bypass Login? And how would you go about it? The web scraper we have is able to scrape a front end of a website but at the moment we are struggling to get it to scrape a website that requires you to register to get access to the information we are wanting to scrape. For example, we can scrape the prices of products if they are visible to the public, but unable to do it if you have to register to the site first to see the products. Signing up is not an issue, its getting the scraper to then work using those registered details to scrape. Any tips or advice would be excellent. Or if there is a tool that does that kind of thing you could highlight to me that would be great. Many thanks.

Replies

Well, one solution is logging in then grabbing the auth headers/cookies/whatever and then sending that along with every scrape request you do.
Misha Krunic
Hey Paul! This is actually my specific area of expertise! It's what my main product - https://www.producthunt.com/post... - is for! I don't mean to sound overconfident, but it's very advanced, and bypassing logins is something that we're able to deal with for many of our clients. Also, either quantitative or qualitative data is not an issue! I'm sure it's exactly what you're looking for, but if you have any more questions feel free to ask!
Paul Woodthorpe
@price2spy I will take a look. We do have a scraper in place but my developer has had issues trying to bypass logins. Thanks for giving me something to look at.
Alexey Olkhovoy
Hi. Have you been able to find a suitable solution? I'm currently looking for a solution for scraping public sites with themed articles. I would like it to have an API right away, to pull this data directly to the site without writing server-side code. So far I liked ParseHub (very powerful thing, but with horrible interface) and SimpleScrapper (easy to use, but with functional limitations). Maybe there is something else? Translated with www.DeepL.com/Translator (free version)
Y Gao
I think it's intended to not let people scrape information from pages after login (Security and privacy). But if you really want to do it, @Casper S.'s method may be the only legal way to do.