A lightweight (0.9B) professional OCR model. Achieves SOTA (94.6 on OmniDocBench) on complex layouts, tables, and handwriting. Supports vLLM/SGLang for ultra-fast inference.
Tested this by throwing an image of a complex table at it.
The recognition was extremely accurate and fast. It reconstructed the table structure into clean Markdown & JSON perfectly.
For a model with only 0.9B parameters, this efficiency is impressive. This model will be a good fit for RAG pipelines where you need to parse heavy layouts without high latency.
It handles mixed content like handwriting, LaTeX formulas, and stamps surprisingly well. Under the hood, the model pairs a CogViT visual encoder with a GLM-0.5B decoder, so it supports vLLM and Ollama out of the box. Great for edge deployment.
Replies
Flowtica Scribe
Hi everyone!
Tested this by throwing an image of a complex table at it.
The recognition was extremely accurate and fast. It reconstructed the table structure into clean Markdown & JSON perfectly.
For a model with only 0.9B parameters, this efficiency is impressive. This model will be a good fit for RAG pipelines where you need to parse heavy layouts without high latency.
It handles mixed content like handwriting, LaTeX formulas, and stamps surprisingly well. Under the hood, the model pairs a CogViT visual encoder with a GLM-0.5B decoder, so it supports vLLM and Ollama out of the box. Great for edge deployment.