Parse, extract, and split your hardest documents with unmatched accuracy. Read any layout with specialized vision models, and ship reliable pipelines in minutes, not months.
Extend already processes millions of pages daily for leading AI teams like @Brex, @Mercury, @Opendoor, and hundreds of others. Now, its even better.
Parse 2.0 is SOTA quality on RealDoc-Bench, our open source benchmark that measures agent success rate on real world docs that agents actually encounter in production.
We trained Parse 2.0 on 1M+ pages of the hardest documents seen in production. Here’s how it stacks up:
#1 in healthcare, real estate, logistics, and financial services
95.7% agent Q&A accuracy on 581 docs (next best: 92%)
Hi everyone! If anyone tells you that PDFs are solved, they probably haven't worked with the PDFs our customers see in production. We're talking bill of lading in shipping and logistics, clinical reports, IRS forms, etc.
Parse 2.0 let's your agents actually work with reliable inputs, no matter how hard the documents are. This allows you to build:
RAG systems that accurately answers questions with precise citation sourcing
Automated workflows to accelerate document workflows
Agents that take action on documents (e.g. routing, classification, extraction, etc)
Parse 2.0 is a SOTA, layout-first document parsing API for agents that need reliable inputs. It features:
A completely rebuilt layout model trained on 1M+ of the hardest docs
New specialized OCR and VLM downstream models to handle specific doc components (e.g. forms, tables, handwriting, etc)
New reading order model to preserve semantic meaning (not every doc should be read left to right, top to bottom)
If you need accurate PDF parsing, check it out and let us know what you think!
Report
How do your specialized vision models handle multi-column layouts, mixed tables, or low-quality scanned PDFs compared to standard LLMs?
hey @ingvar_borzov great question, standard LLMs are general-purpose and can be quite costly with high latency for doc parsing, esp on docs with those complex components you listed. You also get a lot less config control and relying on prompt engineering is brittle. Our VLMs are fine-tuned to handle specific layout components like tables, forms, handwriting, barcodes, etc. And we layer on an optional agentic OCR loop for especially challenging edge cases.
The real unlock here isn’t OCR accuracy it’s preserving semantic reading order under structure ambiguity.
Most pipelines break not on extraction, but on downstream assumptions about hierarchy (especially tables/forms where “correct text” ≠ “correct meaning flow”).
Curious how do you handle evaluation when ground truth layout interpretation is subjective (e.g. multi-table docs or mixed narrative/forms)?
@new_user___1452026946a93788355af99 the challenge w/multi-table and mixed narrative comes down to reading order. irregular form means sometimes you have to read a whole column first before the next vs going left to right and up to down prescriptively. for reading order, ground truth is how a human would read a doc to extract meaning.
Replies
Kilo Code
"Over 1 billion PDFs are created every day, and your agents still can't read them reliably."
@Extend announced Parse 2.0, their new document parsing API.
Founder and CEO @kbyatnal on X:
Tech Marketing Framework
Hi everyone! If anyone tells you that PDFs are solved, they probably haven't worked with the PDFs our customers see in production. We're talking bill of lading in shipping and logistics, clinical reports, IRS forms, etc.
Parse 2.0 let's your agents actually work with reliable inputs, no matter how hard the documents are. This allows you to build:
RAG systems that accurately answers questions with precise citation sourcing
Automated workflows to accelerate document workflows
Agents that take action on documents (e.g. routing, classification, extraction, etc)
Parse 2.0 is a SOTA, layout-first document parsing API for agents that need reliable inputs. It features:
A completely rebuilt layout model trained on 1M+ of the hardest docs
New specialized OCR and VLM downstream models to handle specific doc components (e.g. forms, tables, handwriting, etc)
New reading order model to preserve semantic meaning (not every doc should be read left to right, top to bottom)
If you need accurate PDF parsing, check it out and let us know what you think!
How do your specialized vision models handle multi-column layouts, mixed tables, or low-quality scanned PDFs compared to standard LLMs?
Tech Marketing Framework
hey @ingvar_borzov great question, standard LLMs are general-purpose and can be quite costly with high latency for doc parsing, esp on docs with those complex components you listed. You also get a lot less config control and relying on prompt engineering is brittle. Our VLMs are fine-tuned to handle specific layout components like tables, forms, handwriting, barcodes, etc. And we layer on an optional agentic OCR loop for especially challenging edge cases.
Here's a benchmark if you're interested in objective measures of performance! https://www.extend.ai/resources/realdocbench
The real unlock here isn’t OCR accuracy it’s preserving semantic reading order under structure ambiguity.
Most pipelines break not on extraction, but on downstream assumptions about hierarchy (especially tables/forms where “correct text” ≠ “correct meaning flow”).
Curious how do you handle evaluation when ground truth layout interpretation is subjective (e.g. multi-table docs or mixed narrative/forms)?
Tech Marketing Framework
@new_user___1452026946a93788355af99 the challenge w/multi-table and mixed narrative comes down to reading order. irregular form means sometimes you have to read a whole column first before the next vs going left to right and up to down prescriptively. for reading order, ground truth is how a human would read a doc to extract meaning.