Michael Ozersky

SiteRows example #3

by•

Hello everyone. Welcome back to my siterows.com series.

🔹 Quick reminder of what this app does: Allows you to scrape web content with SQL, like you would query a DB. Creating a FREE account unlocks automation features and higher usage limits.

🔹 Today's question: "What if I'm using something like Selenium to navigate somewhere (perhaps logging into a site), and then I want to query the page?"

In order to accommodate this, I just added the ability to pass a raw HTML string to the /Scrape API endpoint, instead of a URL. Below is a Python example/response that demonstrates this new feature, and I'm also saving all examples in this GitHub repo: github.com/sgt-oz/SiteRows.

Basically here is what's happening:
* Function fetch_logged_in_html() uses Selenium to log into a site and return the home page's HTML string
* That string is passed to the /Scrape API endpoint. Instead of the payload being: {"url": "mywebpage.com"}, you just pass:
{"html": "<body><a href=...>.....</body>"}

....and that's pretty much it. SiteRows queries your HTML and gets results as if you passed a page URL to it.

Thanks, everyone. Please don't hesitate to reach out with any questions or feedback.

1 view

Add a comment

Replies

Be the first to comment