Zac Zuo

DeepSeek-OCR - Read documents like an image

DeepSeek-OCR is a model that compresses long text by treating it as an image. This optical compression uses far fewer vision tokens to represent documents, unlocking new levels of efficiency for long-context tasks while delivering powerful OCR capabilities.

Add a comment

Replies

Best
Zac Zuo
Hi everyone! DeepSeek's multimodal models haven't always been their main focus, but I think this was a strategic choice: "train the brain, then the eyes." Now, with DeepSeek-OCR, we're seeing that strategy pay off in a really interesting way. On the surface, it's a powerful OCR model that can convert documents to Markdown, do general image OCR, parse tables, and more. But the really clever idea here is their exploration of "optical compression." They're testing if it's possible to turn long documents into images, and then use a much smaller number of vision tokens to store the same information that would have required a huge number of text tokens. It's a smart approach. If compute is the bottleneck, you find clever ways to be more efficient. It's a good reminder that there's often more than one way to solve a problem, and real innovation often comes from working with constraints. Yeah, DeepSeek probably can't get more NVIDIA GPUs, but that's not stopping them from pushing ahead, is it? :)
Syed Sohaib Ahmed

That model seems heavily focused on grounding. Not sure how it compared with PaddleOCR-VL or Nanonet-OCR2.

Kumar Abhishek

Interesting update to DeepSeek models. Thanks for sharing the details, Zac.

Kumar Abhishek

Interesting update to DeepSeek models. Thanks for sharing the details, Zac.

Abdul Rehman

Would love to see benchmarks, how does DeepSeek-OCR compare with GPT-4V or Gemini for table parsing?

Alvis Chu

Very CV paper, reminding me of VGG time.

Chris Hicken

It’s impressive how they keep innovating even with limited compute resources.

Esther George
This is such a clever workaround for the compute bottleneck — turning text into images to save tokens is genius-level constraint thinking. DeepSeek may not have infinite GPUs, but they clearly have infinite creativity. 🔥 Kudos to you and your team @zaczuo
Haris Mehmood

DeepSeek OCR has a strong leap forward in document processing, treating long texts as images and then doing OCR and reasoning on them is a clever workaround for token-limit bottlenecks. The community highlight that the upload and reason feature makes it useful for real work. To add even more value I’d love to see a live layout awareness mode (so it doesn’t just capture text but preserves and exposes tables, sidebars and image-text interplay for editing and export) and a failure-root-explanation panel (triggered when the OCR or reasoning chain fails, showing the weak link in the chain to help users debug rather than just “retry”). Great work! can’t wait to see how you scale this!

Nick L
It's a smart approach indeed. It would be great to test this model
12
Next
Last