Geekflare API - Scrape, Screenshot, and extract LLM-ready data

Geekflare

•5mo ago

Turn the chaotic web into clean Markdown. Access a suite of APIs to feed your legacy and AI apps without the infrastructure overhead.

Replies

Best

Geekflare

Maker

📌

Hi Product Hunt! 👋 I am super excited to re-introduce you to Geekflare API. We realized that while APIs for scraping existed, most weren't optimized for the new wave of AI agents and LLMs. We rebuilt it with a specific goal: Data quality for AI. If you are building RAG pipelines or training models, you know that HTML noise is a nightmare. Here is how we solve that: Clean Markdown: We convert web pages into Markdown, so your LLM can actually understand the content without the clutter. Rich Metadata: Get structured JSON metadata to provide context to your models. Enterprise Ready: We now offer custom API solutions if you need specific infrastructure adjustments. 👀 Coming Soon: We are heads-down building Search and AI-driven APIs to make data retrieval even smarter. I’d love to hear your feedback on the output quality. Let me know what you think!

Report

5mo ago

@chandankumar congrats on the launch! Is there a default HTML cleaning or it's possible to customize it?

Report

5mo ago

Geekflare

Maker

Thank you, Austin.

We are going to release default HTML cleaning next week.

Report

5mo ago

@chandankumar clean markdown output for llms is exactly what rag pipelines need. parsing html and removing noise manually is a pain.

structured metadata on top of that is useful for adding context to chunks.

how does it handle javascript heavy sites? does it wait for dynamic content to load or just grab the initial html?

Report

5mo ago

Geekflare

Maker

@topfuelauto thank you!

We use headless browser and wait for dynamic content to load. We have 30 seconds timeout. If you encounter any issues, please let me know.

Report

5mo ago