fmerian

Extend - Parse any PDF layout with SOTA accuracy for AI pipelines

by
Parse, extract, and split your hardest documents with unmatched accuracy. Read any layout with specialized vision models, and ship reliable pipelines in minutes, not months.

Add a comment

Replies

Best
fmerian

"Over 1 billion PDFs are created every day, and your agents still can't read them reliably."

@Extend announced Parse 2.0, their new document parsing API.

Founder and CEO @kbyatnal on X:

Extend already processes millions of pages daily for leading AI teams like @Brex, @Mercury, @Opendoor, and hundreds of others. Now, its even better.

Parse 2.0 is SOTA quality on RealDoc-Bench, our open source benchmark that measures agent success rate on real world docs that agents actually encounter in production.

We trained Parse 2.0 on 1M+ pages of the hardest documents seen in production. Here’s how it stacks up:

  • #1 in healthcare, real estate, logistics, and financial services

  • 95.7% agent Q&A accuracy on 581 docs (next best: 92%)

  • 0.847 F1 on layout (next best: 0.759)

Jing

Hi everyone! If anyone tells you that PDFs are solved, they probably haven't worked with the PDFs our customers see in production. We're talking bill of lading in shipping and logistics, clinical reports, IRS forms, etc.

Parse 2.0 let's your agents actually work with reliable inputs, no matter how hard the documents are. This allows you to build:

  • RAG systems that accurately answers questions with precise citation sourcing

  • Automated workflows to accelerate document workflows

  • Agents that take action on documents (e.g. routing, classification, extraction, etc)

Parse 2.0 is a SOTA, layout-first document parsing API for agents that need reliable inputs. It features:

  • A completely rebuilt layout model trained on 1M+ of the hardest docs

  • New specialized OCR and VLM downstream models to handle specific doc components (e.g. forms, tables, handwriting, etc)

  • New reading order model to preserve semantic meaning (not every doc should be read left to right, top to bottom)

If you need accurate PDF parsing, check it out and let us know what you think!

Ingvar Borzov

How do your specialized vision models handle multi-column layouts, mixed tables, or low-quality scanned PDFs compared to standard LLMs?

Jing

hey @ingvar_borzov great question, standard LLMs are general-purpose and can be quite costly with high latency for doc parsing, esp on docs with those complex components you listed. You also get a lot less config control and relying on prompt engineering is brittle. Our VLMs are fine-tuned to handle specific layout components like tables, forms, handwriting, barcodes, etc. And we layer on an optional agentic OCR loop for especially challenging edge cases.

Here's a benchmark if you're interested in objective measures of performance! https://www.extend.ai/resources/realdocbench

Anna Kulyk

The real unlock here isn’t OCR accuracy it’s preserving semantic reading order under structure ambiguity.

Most pipelines break not on extraction, but on downstream assumptions about hierarchy (especially tables/forms where “correct text” ≠ “correct meaning flow”).

Curious how do you handle evaluation when ground truth layout interpretation is subjective (e.g. multi-table docs or mixed narrative/forms)?

Jing

@new_user___1452026946a93788355af99 the challenge w/multi-table and mixed narrative comes down to reading order. irregular form means sometimes you have to read a whole column first before the next vs going left to right and up to down prescriptively. for reading order, ground truth is how a human would read a doc to extract meaning.