Zac Zuo

GLM-OCR - SOTA document parsing & OCR in just 0.9B parameters

A lightweight (0.9B) professional OCR model. Achieves SOTA (94.6 on OmniDocBench) on complex layouts, tables, and handwriting. Supports vLLM/SGLang for ultra-fast inference.

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

Tested this by throwing an image of a complex table at it.

The recognition was extremely accurate and fast. It reconstructed the table structure into clean Markdown & JSON perfectly.

For a model with only 0.9B parameters, this efficiency is impressive. This model will be a good fit for RAG pipelines where you need to parse heavy layouts without high latency.

It handles mixed content like handwriting, LaTeX formulas, and stamps surprisingly well. Under the hood, the model pairs a CogViT visual encoder with a GLM-0.5B decoder, so it supports vLLM and Ollama out of the box. Great for edge deployment.