One thing that could probably improve is giving users more transparency into confidence scores and extraction reasoning. For sensitive workflows like legal or financial processing, visibility into why certain outputs were generated can help teams trust the system more
Parsewise
Hello from Greg, Max and the Parsewise team!
Having seen Parsewise provide tremendous value in production for our enterprise customers (incl. UBS, Compre Group, Thinksurance), we are excited to launch our API!
The Problem
Today, you have to build and maintain complex document processing pipelines with changing business rules. You parse, classify, and rely on structured responses from LLMs or IDP tools (e.g., Reducto) to get individual extraction results that you piece together with other bits of information. There’s no reliable way to catch when information contradicts itself, which is risky. Finally, you build a custom verification UI for your operations team to deal with LLM mistakes.
The Solution
We provide an API to abstract all that away into a single call. You provide multiple documents and the desired output to get back a response with resolved values, flagged contradictions, and full traceability across documents / pages that you can display in your own app.
Get Started
=================================================
Sign up with free credits: https://www.parsewise.ai/get-started
(use the Agents.md for a 1m integration)
=================================================
Parsewise
Max here, @greg_csegzi's co-founder. One quick add for the builders: if you want to look under the hood before signing up, our docs and quickstart are fully public.
Docs: https://docs.parsewise.ai/
Getting started: https://docs.parsewise.ai/getting-started
The getting started guide walks you from zero to your first multi-document call, with resolved values, flagged contradictions, and full traceability, in a few minutes. If you hit anything confusing or have a use case you're not sure we handle, write a comment below. Greg and I are here and will answer every one.
The lineage-down-to-bounding-boxes is the part I'd actually pay for — most extraction APIs hand you a value plus a confidence score and leave you to trust it. But bounding boxes only make sense for visual docs; when the source is a spreadsheet or plain text, what does "lineage" resolve to — a cell reference, a line number, nothing? The human-validation UI only works if every value points back to something clickable.
Parsewise
@sounak_bhattacharya very good point!
For excels we indeed provide cell references but also a focused excel preview lineage-down-to-bounding-boxes:
(see here live)
We have found that this kind of preview helps operational users not have to go digging back to the original spreadsheet.
contradiction detection across documents is the part most people build manually and badly. when your agent pulls data from 3 sources and they disagree, knowing which one to trust without a human checking every time is a hard problem. how does the lineage tracing work in practice
Parsewise
@tina_chhabra very good question and indeed it's a pain especially bc different LLMs may even pick different documents to trust!
Lineage tracing in practice means that a user can:
- start from a resolved value
- click to go deep and see all underlying values and the logic used to arrive at the resolved value
- for each underlying value seeing if it agrees with the resolved value and seeing a word level bounding box for its provenance
There are 3 core components that make this possible:
1. For any one data point, we pull ALL of the relevant sources, even if that's across 15 documents and 20 pages
2. For all of these hits, we need to compare them and decide whether 1 correct answer needs to be picked from them or whether they need combining
3. The logic for reconciling is explicitly written out when the user defines their initial target, so they can edit it, and our agents can make suggestions when a previously unseen disagreement occurs
Parsewise
@tina_chhabra Hey Tina, you're so right regarding building in-house. We often see customers building document-by-document extraction tools that fall apart when the data has inconsistencies.
Building on Greg's point above, we allow users to set guidelines that help our agents decide what the "correct" value is.
A screenshot below from app.parsewise.ai shows an example:
You can also play around with some of our demos here: demo.parsewise.ai
How does it handle documents with different formats in the same call say a scanned PDF, a spreadsheet, and a plain text file? Does accuracy drop when mixing formats?
Parsewise
@aanchal_dahiya Hey Aanchal, we support mixing formats! Our agentic system processes each file independently and in-full, whilst also combining context with all other files in the set. This means that there's no accuracy drop as more files are added. You should try it out!
@shansingh oh cool is there a free tier or sandbox to test with a few docs before committing?
Parsewise
@aanchal_dahiya hey Aanchal, there is indeed - our free tier should allow you to try on a few thousand pages!
https://www.parsewise.ai/get-started
The contradiction-detection feature is the part most agent doc tools skip. The honest question I have: when two documents contradict, who decides which one wins? Do you surface both to a human reviewer with the conflict highlighted, or does the agent pick a winner and log the call?
Asking because the regulated customers you listed (UBS especially) presumably need an audit trail on that decision, not just the final value.