Taylor Moore

Raptor - Hot patch, cache, protect your for LLM API. Built in Rust.

by•
Rust-powered AI gateway that actually slaps. Semantic caching: 500ms → 8ms. Semantic firewall: catches jailbreaks/malicious actors by intent, not keywords. Hot-patch: fix hallucinations without redeploying. One line change. Free tier. Your API bill will thank you.

Add a comment

Replies

Best
Taylor Moore
Maker
📌
We built the "Stripe for Ingestion" so you can stop writing parsers. Hey Product Hunt! I’m the maker of Raptor Data. We built Raptor Data because the "Day 2" of building RAG apps is miserable. You build the prototype, but then users start updating documents. Suddenly, you're writing complex Python scripts to parse PDFs, manage versions, and huge AWS bills to re-embed 500-page files just because a date changed. We wanted a "Stripe-like" experience for this. One line of code to handle the dirty work. What is Raptor? It's a lightweight, Edge-ready TypeScript SDK (other language support coming soon) that gives you: 1. Git-like Versioning: We hash every chunk and sentence. If you upload v2.pdf, we diff it against v1.pdf and only return the chunks that changed. 2. Universal Parsing: PDF and DOCx - we handle the edge cases, zip bombs, and encoding nightmares. 3. Cost Savings: By only embedding the "diffs," our users see ~90% reduction in Vector DB and OpenAI costs. The Stack: We built the SDK to be isomorphic. It runs in Node, the Browser, or Cloudflare Workers. The backend is heavily optimized FastAPI for parsing speed. We have a Free Tier (1k pages). I'd love for you to npm install @raptor-data/ts-sdk and tell me if the DX lives up to the promise. Happy shipping and thanks for your support!
Taylor Moore

The Math That Made Us Build This

I want to share the numbers that convinced us this problem was worth solving.

We analyzed the embedding costs of a typical contract management system:

The Scenario:

  • 500 contracts

  • Each contract averages 3 versions (negotiation rounds)

  • ~1,000 chunks per contract

Traditional RAG approach:

500 contracts × 3 versions × 1,000 chunks = 1.5M chunks embedded

What actually changed between versions?

v1 → v2: Usually 5-10% of content

v2 → v3: Usually 2-5% of content

With version-aware processing:

500 × 1,000 (initial) = 500K chunks

500 × 75 (v2 avg) = 37.5K chunks

500 × 50 (v3 avg) = 25K chunks

Total: 562.5K chunks

That's 62% waste eliminated. At scale, we've seen teams hit 90%+ savings.

The crazier part? Most teams don't even know they're doing this. They see embedding costs rise linearly with document count and assume that's normal. It's not.

Why This Happens

Most document pipelines treat each upload as isolated. They have no concept of:

  1. "This is version 2 of that contract"

  2. "These 950 chunks are identical to what we already have"

  3. "Only these 50 chunks are actually new"

So they re-process everything. Every time.

We built Raptor to track document lineage automatically. When you upload `contract_v2.pdf`, we:

  1. Detect it's related to `contract_v1.pdf`

  2. Diff at the chunk level

  3. Return only what changed

  4. You only embed the diff

The Other Problem We Solve

Cost is one thing. Data quality is another.

PyPDF extracts this financial table:

Revenue

$1,000,000

$1,200,000

COGS

$200,000

Your AI sees flat text and has to guess "$1,000,000" means Q1 Revenue. It guesses wrong. You get hallucinations.

Raptor preserves table structure:

| Metric | Q1 2024 | Q2 2024 |

|---------|------------|--------------|

| Revenue | $1,000,000 | $1,200,000 |

Now your AI knows exactly what each number means.

Try It

If you're building RAG and want to see what you're currently wasting:

1. Process a document

2. Process an updated version

3. Check the dedup stats

The free tier is 1K pages/month. That's enough to run real tests.

Would love feedback on the SDK experience. We're optimizing for "Stripe-like" DX and want to know if we're hitting the mark.

Taylor Moore

Here from Australia 🦘 Let me know if you have any questions!