Mohsin Raja

Pduut - From textbooks to structured knowledge — PDFs, untangled

by
Pduut is an open-source PDF extractor built for students & researchers. It splits books page-by-page, capturing text, equations, and diagrams into structured JSON—perfect for RAG datasets. Join us, contribute, and make learning accessible!

Add a comment

Replies

Best
Mohsin Raja
Maker
📌
When I was studying, I often struggled with textbooks—especially when I needed specific answers or wanted to build a dataset for AI tools like RAG. Extracting text, equations, and diagrams manually was painfully slow. That’s why I built Pduut (PDF Data Unification and Understanding Tool). 📖 Splits PDFs page by page ✍️ Preserves text, diagrams, equations, and graphs 🔗 Outputs structured JSON (ready for RAG or research) 🌍 Open-source—so anyone can contribute and improve it This is still early, and I’d love feedback from students, researchers, and builders. Together, we can make PDF knowledge extraction smarter and more accessible. 🚀