Open-source SDK for extracting structured data from documents. Works with OpenAI, Mistral, Google Gemini, and HuggingFace. Type-safe extraction, batch processing, and document classification built-in. Available for Python and TypeScript.
Vy - Cross platform AI agent ā AI agent that uses your computer, cross platform, no APIs
AI agent that uses your computer, cross platform, no APIs
Promoted
Maker
š
Hey Product Hunt! š
I'm excited to share Docuglean with you today!
What is it? Docuglean is an open-source SDK that makes document processing actually enjoyable. Extract structured data from receipts, invoices, contracts, or any document in just a few lines of code.
The problem I was solving: I was building an expense tracking app and spent weeks writing boilerplate code to parse receipts and extract data. Every time I wanted to switch AI providers or add a new document type, I had to rewrite everything. I realized every developer faces this same problem.
The solution: A unified SDK that abstracts away the complexity. One API, multiple providers, type-safe outputs.
Key features: ā Multi-provider support (OpenAI, Mistral, Gemini, HuggingFace) ā Type-safe extraction with Zod/Pydantic schemas ā Batch processing with automatic error handling ā Document classification (auto-split 100+ page docs) ā Local parsing for common formats (no API needed) ā Python + TypeScript support ā Apache 2.0 licensed - free forever
Who is this for? - Developers building fintech/expense apps - Teams processing invoices/receipts at scale - Anyone tired of writing document parsing code - Startups needing to extract data from contracts/forms - Developers who want to avoid vendor lock-in
What makes it different? Unlike other solutions, Docuglean is:
Provider-agnostic (switch with one line)
Type-safe by default (no manual JSON parsing)
Built for batch processing (handle thousands of docs)
Open source (inspect, modify, contribute)
Cost-efficient (local parsing for common formats)
I'd love your feedback on:
What document types would you want to process?
What features would make this more useful?
What providers should we add next?
Thanks for checking it out! Happy to answer any questions š
Have a question about Docuglean? Ask it here and get a real answer.
Do you use Docuglean?
Maker Comment
Maker
š
Hey Product Hunt! š
I'm excited to share Docuglean with you today!
What is it? Docuglean is an open-source SDK that makes document processing actually enjoyable. Extract structured data from receipts, invoices, contracts, or any document in just a few lines of code.
The problem I was solving: I was building an expense tracking app and spent weeks writing boilerplate code to parse receipts and extract data. Every time I wanted to switch AI providers or add a new document type, I had to rewrite everything. I realized every developer faces this same problem.
The solution: A unified SDK that abstracts away the complexity. One API, multiple providers, type-safe outputs.
Key features: ā Multi-provider support (OpenAI, Mistral, Gemini, HuggingFace) ā Type-safe extraction with Zod/Pydantic schemas ā Batch processing with automatic error handling ā Document classification (auto-split 100+ page docs) ā Local parsing for common formats (no API needed) ā Python + TypeScript support ā Apache 2.0 licensed - free forever
Who is this for? - Developers building fintech/expense apps - Teams processing invoices/receipts at scale - Anyone tired of writing document parsing code - Startups needing to extract data from contracts/forms - Developers who want to avoid vendor lock-in
What makes it different? Unlike other solutions, Docuglean is:
Provider-agnostic (switch with one line)
Type-safe by default (no manual JSON parsing)
Built for batch processing (handle thousands of docs)
Open source (inspect, modify, contribute)
Cost-efficient (local parsing for common formats)
I'd love your feedback on:
What document types would you want to process?
What features would make this more useful?
What providers should we add next?
Thanks for checking it out! Happy to answer any questions š
Hey Product Hunt! š
I'm excited to share Docuglean with you today!
What is it?
Docuglean is an open-source SDK that makes document processing actually enjoyable. Extract structured data from receipts, invoices, contracts, or any document in just a few lines of code.
The problem I was solving:
I was building an expense tracking app and spent weeks writing boilerplate code to parse receipts and extract data. Every time I wanted to switch AI providers or add a new document type, I had to rewrite everything. I realized every developer faces this same problem.
The solution:
A unified SDK that abstracts away the complexity. One API, multiple providers, type-safe outputs.
Key features:
ā Multi-provider support (OpenAI, Mistral, Gemini, HuggingFace)
ā Type-safe extraction with Zod/Pydantic schemas
ā Batch processing with automatic error handling
ā Document classification (auto-split 100+ page docs)
ā Local parsing for common formats (no API needed)
ā Python + TypeScript support
ā Apache 2.0 licensed - free forever
Who is this for?
- Developers building fintech/expense apps
- Teams processing invoices/receipts at scale
- Anyone tired of writing document parsing code
- Startups needing to extract data from contracts/forms
- Developers who want to avoid vendor lock-in
What makes it different?
Unlike other solutions, Docuglean is:
Provider-agnostic (switch with one line)
Type-safe by default (no manual JSON parsing)
Built for batch processing (handle thousands of docs)
Open source (inspect, modify, contribute)
Cost-efficient (local parsing for common formats)
I'd love your feedback on:
What document types would you want to process?
What features would make this more useful?
What providers should we add next?
Thanks for checking it out! Happy to answer any questions š
GitHub: https://github.com/docuglean-ai/...