Open-source LLM optimized for advanced reasoning and code

Start new thread

DeepSeek-OCR - Read documents like an image

Flowtica Scribe

•7mo ago

DeepSeek-OCR is a model that compresses long text by treating it as an image. This optical compression uses far fewer vision tokens to represent documents, unlocking new levels of efficiency for long-context tasks while delivering powerful OCR capabilities.

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone! DeepSeek's multimodal models haven't always been their main focus, but I think this was a strategic choice: "train the brain, then the eyes." Now, with DeepSeek-OCR, we're seeing that strategy pay off in a really interesting way. On the surface, it's a powerful OCR model that can convert documents to Markdown, do general image OCR, parse tables, and more. But the really clever idea here is their exploration of "optical compression." They're testing if it's possible to turn long documents into images, and then use a much smaller number of vision tokens to store the same information that would have required a huge number of text tokens. It's a smart approach. If compute is the bottleneck, you find clever ways to be more efficient. It's a good reminder that there's often more than one way to solve a problem, and real innovation often comes from working with constraints. Yeah, DeepSeek probably can't get more NVIDIA GPUs, but that's not stopping them from pushing ahead, is it? :)

Report

7mo ago

That model seems heavily focused on grounding. Not sure how it compared with PaddleOCR-VL or Nanonet-OCR2.

Report

7mo ago

Interesting update to DeepSeek models. Thanks for sharing the details, Zac.

Report

7mo ago

Interesting update to DeepSeek models. Thanks for sharing the details, Zac.

Report

7mo ago

Super Intern

Very CV paper, reminding me of VGG time.

Report

7mo ago

Theysaid

It’s impressive how they keep innovating even with limited compute resources.

Report

7mo ago

This is such a clever workaround for the compute bottleneck — turning text into images to save tokens is genius-level constraint thinking. DeepSeek may not have infinite GPUs, but they clearly have infinite creativity. 🔥 Kudos to you and your team @zaczuo

Report

7mo ago

DeepSeek OCR has a strong leap forward in document processing, treating long texts as images and then doing OCR and reasoning on them is a clever workaround for token-limit bottlenecks. The community highlight that the upload and reason feature makes it useful for real work. To add even more value I’d love to see a live layout awareness mode (so it doesn’t just capture text but preserves and exposes tables, sidebars and image-text interplay for editing and export) and a failure-root-explanation panel (triggered when the OCR or reasoning chain fails, showing the weak link in the chain to help users debug rather than just “retry”). Great work! can’t wait to see how you scale this!

Report

7mo ago

It's a smart approach indeed. It would be great to test this model

Report

7mo ago

Awesome. Is it available in api?

Report

6mo ago