[LW24] Megaparse

[LW24] Megaparse

Open-source File Parser optimized for LLM ingestion

223 followers

File Parser optimized for LLM Ingestion. Parse PDFs, DOCX, PPTX in a format that is ideal for LLMs. All of that accessible from a python package, an API, or a queue.
Megaparse [LW24] gallery image
Megaparse [LW24] gallery image
Megaparse [LW24] gallery image
Free
Launch tags:Developer ToolsGitHub
Launch Team / Built With
ace.me
ace.me
Your new website, email address & cloud storage
Promoted

What do you think? …

Stan Girard
Hi everyone, Today I’d like to introduce you to the new Quivr project. It a simple python package, API that helps you take in documents such as PDFs, Docx, PPTx, ... and turn them into Markown It has several new abilities: * OCR * Vision Models * Table Optimization in the extraction * Open-source You can use it in any of your products where you need to parse file to then send them to an LLM or simply store it Here is how to get started: * Go to https://github.com/QuivrHQ/MegaP... * pip install megaparse * Have fun Give it a try! We’d love to hear your feedback and ideas in the comments. This is part of Supabase mega Launch Week -> https://launchweek.dev/HOME
Sacha Dumay
@stan_girard great tool !!!!
Damien Henry
@stan_girard whoo! This is awesome!!! I'll try it in my next project
Christophe Pasquier
Everyone that went through the pain of parsing slides and pdf know how big a problem that solves ;) GG team!
Stan Girard
@christophepas Thanks mate! Let me know if you are using it and I'll gladly help you improve it
Michael Ohana
Awesome ! How does it tackle tables in financial documents?
Stan Girard
@michaelohana This is a hard piece to tackle, we are currently working hard on improving tables. We are exploring some techniques. For example we are looking at combining LLM Vision models with current OCR. Passing the table to a dataframe. Would love to tell you more or help you with your use case. Ping me if need on twitter @_StanGirard
Ashit Vora
Congrats on the launch @stan_girard @amine_dirhoussi @chloe_daems Super helpful. We are working on a product that needs something similar though we have already solved the PDF parsing problem. Quick question - do you plan to add Excel / Spreadsheet as well? This would be super helpful. Excited to give it a try!
Max Comperatore
stan this is sweet. thank you. will use. upvoted and starred
Stan Girard
@maxcompe Thanks mate! We worked hard on this one
Florian Buguet
Wow, this looks super handy for integrating document parsing into LLM workflows! 🚀 Love that it's open-source and includes OCR + table optimization—makes it a no-brainer for anyone working with complex document data. Can't wait to test it out! 🔥
Huzaifa Shoukat
Congrats on the launch! Megaparse looks like a game-changer for parsing docs into Markdown format. What types of files do you find it works best with?
12
Next
Last