Hey Product Hunt! 👋

I'm excited to share Docuglean with you today!

What is it?
Docuglean is an open-source SDK that makes document processing actually enjoyable. Extract structured data from receipts, invoices, contracts, or any document in just a few lines of code.

The problem I was solving:
I was building an expense tracking app and spent weeks writing boilerplate code to parse receipts and extract data. Every time I wanted to switch AI providers or add a new document type, I had to rewrite everything. I realized every developer faces this same problem.

The solution:
A unified SDK that abstracts away the complexity. One API, multiple providers, type-safe outputs.

Key features:
✅ Multi-provider support (OpenAI, Mistral, Gemini, HuggingFace)
✅ Type-safe extraction with Zod/Pydantic schemas
✅ Batch processing with automatic error handling
✅ Document classification (auto-split 100+ page docs)
✅ Local parsing for common formats (no API needed)
✅ Python + TypeScript support
✅ Apache 2.0 licensed - free forever

Who is this for?
- Developers building fintech/expense apps
- Teams processing invoices/receipts at scale
- Anyone tired of writing document parsing code
- Startups needing to extract data from contracts/forms
- Developers who want to avoid vendor lock-in

What makes it different?
Unlike other solutions, Docuglean is:

Provider-agnostic (switch with one line)
Type-safe by default (no manual JSON parsing)
Built for batch processing (handle thousands of docs)
Open source (inspect, modify, contribute)

Cost-efficient (local parsing for common formats)

I'd love your feedback on:

What document types would you want to process?
What features would make this more useful?
What providers should we add next?

Thanks for checking it out! Happy to answer any questions 🚀

GitHub: https://github.com/docuglean-ai/...

Provider-agnostic (switch with one line)
Type-safe by default (no manual JSON parsing)
Built for batch processing (handle thousands of docs)
Open source (inspect, modify, contribute)

Cost-efficient (local parsing for common formats)

I'd love your feedback on:

What document types would you want to process?
What features would make this more useful?
What providers should we add next?

Thanks for checking it out! Happy to answer any questions 🚀

GitHub: https://github.com/docuglean-ai/...

Docuglean

Extract structured data from any document in 3 lines

Extract structured data from any document in 3 lines