hsingyuchen

DocClean - Convert documents to Markdown. 100% local

by•
DocClean is a privacy-first, self-hosted document converter that turns PDF, Word, Excel, and images into clean, editable Markdown on your own machine. No cloud upload, no data leaving your server. Start with one command: docker-compose up. It includes GPU-accelerated OCR for scanned files, a built-in Markdown editor, and REST API support. Pro adds cross-document search, AI Q&A, and book compilation.

Add a comment

Replies

Best
hsingyuchen
Maker
šŸ“Œ
Hey Product Hunt! Maker here. I built DocClean because I was genuinely frustrated. Every time I needed to convert a sensitive PDF or image to Markdown, the answer was always "upload it to Mathpix" or "Smallpdf." Why should I ship private documents to someone else's server? So I built the thing I wanted to exist. Every cloud converter makes you upload first — dealbreaker for legal docs, medical records, financial reports, anything under NDA. DocClean runs entirely on your own hardware. One command, zero data leaves your machine. What makes it interesting: PaddleOCR (far better than Tesseract for CJK text), language-adaptive OCR with auto model switching, GPU acceleration out of the box, and a full pipeline — not just conversion. Solo developer, first launch. Rough edges exist. Harsh feedback very welcome. git clone https://github.com/chen64811-shi... && docker-compose up
hsingyuchen

Hi everyone! šŸ‘‹ I’m excited to share DocClean with you all today — a 100% local-first tool to turn messy PDFs, Word docs, and even scanned images into clean, structured Markdown, with built-in OCR for both English and Chinese. No data leaves your machine, ever. Get started in one line with Docker: docker-compose up -d This project grew out of my own frustration with bloated online converters that upload everything to the cloud. I wanted something fast, private, and simple to run anywhere. I’d love to hear your feedback, ideas, or issues — let me know what you think! šŸš€