GLM-OCR - SOTA document parsing & OCR in just 0.9B parameters

Flowtica Scribe

•5mo ago

A lightweight (0.9B) professional OCR model. Achieves SOTA (94.6 on OmniDocBench) on complex layouts, tables, and handwriting. Supports vLLM/SGLang for ultra-fast inference.

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone!

Tested this by throwing an image of a complex table at it.

The recognition was extremely accurate and fast. It reconstructed the table structure into clean Markdown & JSON perfectly.

For a model with only 0.9B parameters, this efficiency is impressive. This model will be a good fit for RAG pipelines where you need to parse heavy layouts without high latency.

It handles mixed content like handwriting, LaTeX formulas, and stamps surprisingly well. Under the hood, the model pairs a CogViT visual encoder with a GLM-0.5B decoder, so it supports vLLM and Ollama out of the box. Great for edge deployment.

Report

5mo ago