Chandan Kumar

Geekflare API - Scrape, Screenshot, and extract LLM-ready data

by•
Turn the chaotic web into clean Markdown. Access a suite of APIs to feed your legacy and AI apps without the infrastructure overhead.

Add a comment

Replies

Best
Chandan Kumar
Hi Product Hunt! 👋 I am super excited to re-introduce you to Geekflare API. We realized that while APIs for scraping existed, most weren't optimized for the new wave of AI agents and LLMs. We rebuilt it with a specific goal: Data quality for AI. If you are building RAG pipelines or training models, you know that HTML noise is a nightmare. Here is how we solve that: Clean Markdown: We convert web pages into Markdown, so your LLM can actually understand the content without the clutter. Rich Metadata: Get structured JSON metadata to provide context to your models. Enterprise Ready: We now offer custom API solutions if you need specific infrastructure adjustments. 👀 Coming Soon: We are heads-down building Search and AI-driven APIs to make data retrieval even smarter. I’d love to hear your feedback on the output quality. Let me know what you think!
Austin Heaton

@chandankumar congrats on the launch! Is there a default HTML cleaning or it's possible to customize it?

Chandan Kumar

Thank you, Austin.

We are going to release default HTML cleaning next week.

mostafa kh

@chandankumar clean markdown output for llms is exactly what rag pipelines need. parsing html and removing noise manually is a pain.

structured metadata on top of that is useful for adding context to chunks.

how does it handle javascript heavy sites? does it wait for dynamic content to load or just grab the initial html?

Chandan Kumar

@topfuelauto thank you!

We use headless browser and wait for dynamic content to load. We have 30 seconds timeout. If you encounter any issues, please let me know.