DeepSeek-OCR - Read documents like an image
DeepSeek-OCR is a model that compresses long text by treating it as an image. This optical compression uses far fewer vision tokens to represent documents, unlocking new levels of efficiency for long-context tasks while delivering powerful OCR capabilities.



Replies
Flowtica Scribe
That model seems heavily focused on grounding. Not sure how it compared with PaddleOCR-VL or Nanonet-OCR2.
Interesting update to DeepSeek models. Thanks for sharing the details, Zac.
Interesting update to DeepSeek models. Thanks for sharing the details, Zac.
Triforce Todos
Would love to see benchmarks, how does DeepSeek-OCR compare with GPT-4V or Gemini for table parsing?
Super Intern
Very CV paper, reminding me of VGG time.
Theysaid
It’s impressive how they keep innovating even with limited compute resources.
DeepSeek OCR has a strong leap forward in document processing, treating long texts as images and then doing OCR and reasoning on them is a clever workaround for token-limit bottlenecks. The community highlight that the upload and reason feature makes it useful for real work. To add even more value I’d love to see a live layout awareness mode (so it doesn’t just capture text but preserves and exposes tables, sidebars and image-text interplay for editing and export) and a failure-root-explanation panel (triggered when the OCR or reasoning chain fails, showing the weak link in the chain to help users debug rather than just “retry”). Great work! can’t wait to see how you scale this!