Raptor - Hot patch, cache, protect your for LLM API. Built in Rust.
by•
Rust-powered AI gateway that actually slaps. Semantic caching: 500ms → 8ms. Semantic firewall: catches jailbreaks/malicious actors by intent, not keywords. Hot-patch: fix hallucinations without redeploying.
One line change. Free tier. Your API bill will thank you.


Replies
The Math That Made Us Build This
I want to share the numbers that convinced us this problem was worth solving.
We analyzed the embedding costs of a typical contract management system:
The Scenario:
500 contracts
Each contract averages 3 versions (negotiation rounds)
~1,000 chunks per contract
Traditional RAG approach:
What actually changed between versions?
With version-aware processing:
That's 62% waste eliminated. At scale, we've seen teams hit 90%+ savings.
The crazier part? Most teams don't even know they're doing this. They see embedding costs rise linearly with document count and assume that's normal. It's not.
Why This Happens
Most document pipelines treat each upload as isolated. They have no concept of:
"This is version 2 of that contract"
"These 950 chunks are identical to what we already have"
"Only these 50 chunks are actually new"
So they re-process everything. Every time.
We built Raptor to track document lineage automatically. When you upload `contract_v2.pdf`, we:
Detect it's related to `contract_v1.pdf`
Diff at the chunk level
Return only what changed
You only embed the diff
The Other Problem We Solve
Cost is one thing. Data quality is another.
PyPDF extracts this financial table:
Revenue
$1,000,000
$1,200,000
COGS
$200,000
Your AI sees flat text and has to guess "$1,000,000" means Q1 Revenue. It guesses wrong. You get hallucinations.
Raptor preserves table structure:
| Metric | Q1 2024 | Q2 2024 |
|---------|------------|--------------|
| Revenue | $1,000,000 | $1,200,000 |
Now your AI knows exactly what each number means.
Try It
If you're building RAG and want to see what you're currently wasting:
1. Process a document
2. Process an updated version
3. Check the dedup stats
The free tier is 1K pages/month. That's enough to run real tests.
Would love feedback on the SDK experience. We're optimizing for "Stripe-like" DX and want to know if we're hitting the mark.
Here from Australia 🦘 Let me know if you have any questions!