Docuglean

Docuglean

Extract structured data from any document in 3 lines

2 followers

Open-source SDK for extracting structured data from documents. Works with OpenAI, Mistral, Google Gemini, and HuggingFace. Type-safe extraction, batch processing, and document classification built-in. Available for Python and TypeScript.
Docuglean gallery image
Docuglean gallery image
Free
Launch Team
Vy - Cross platform AI agent
Vy - Cross platform AI agent
AI agent that uses your computer, cross platform, no APIs
Promoted

What do you think? …

Victor Evogor
Maker
šŸ“Œ

Hey Product Hunt! šŸ‘‹

I'm excited to share Docuglean with you today!

What is it?
Docuglean is an open-source SDK that makes document processing actually enjoyable. Extract structured data from receipts, invoices, contracts, or any document in just a few lines of code.

The problem I was solving:
I was building an expense tracking app and spent weeks writing boilerplate code to parse receipts and extract data. Every time I wanted to switch AI providers or add a new document type, I had to rewrite everything. I realized every developer faces this same problem.

The solution:
A unified SDK that abstracts away the complexity. One API, multiple providers, type-safe outputs.


Key features:
āœ… Multi-provider support (OpenAI, Mistral, Gemini, HuggingFace)
āœ… Type-safe extraction with Zod/Pydantic schemas
āœ… Batch processing with automatic error handling
āœ… Document classification (auto-split 100+ page docs)
āœ… Local parsing for common formats (no API needed)
āœ… Python + TypeScript support
āœ… Apache 2.0 licensed - free forever

Who is this for?
- Developers building fintech/expense apps
- Teams processing invoices/receipts at scale
- Anyone tired of writing document parsing code
- Startups needing to extract data from contracts/forms
- Developers who want to avoid vendor lock-in

What makes it different?
Unlike other solutions, Docuglean is:

  • Provider-agnostic (switch with one line)

  • Type-safe by default (no manual JSON parsing)

  • Built for batch processing (handle thousands of docs)

  • Open source (inspect, modify, contribute)

Cost-efficient (local parsing for common formats)

I'd love your feedback on:

  1. What document types would you want to process?

  2. What features would make this more useful?

  3. What providers should we add next?

    Thanks for checking it out! Happy to answer any questions šŸš€

    GitHub: https://github.com/docuglean-ai/...