Raptor Data - Protect, cache and hot patch your LLM APIs. Built in Rust.

Rust-powered AI gateway that actually slaps. Semantic caching: 500ms → 8ms. Semantic firewall: catches jailbreaks and malicious actors by intent, not keywords. Hot-patch: fix hallucinations without redeploying. One line change. Free tier. Your API bill will thank you.

Hey Product Hunt, I'm Taylor. Former Australian fighter pilot, now software engineer. I built Raptor because I was tired of two things: burning money on redundant LLM calls and watching prompt injections slip through keyword filters. What Raptor does: It sits between your app and your LLM provider. Every request flows through and gets three things: 1. Semantic caching Similar queries hit the same cache entry. "What's the weather in Sydney?" and "Sydney weather today?" return the same cached response in under 10ms instead of 500ms from the API. Cuts costs 30-50%. 2. Semantic firewall Blocks prompt injection by intent, not keywords. It catches jailbreaks that look nothing like your blocklist because it understands what the user is trying to do, not just what they typed. 3. Hot-patch Your AI said something wrong in production? Override that specific response in seconds. No redeploy. No waiting for fine-tuning. 4. Observability Trace and observe all of your AI requests in one intuitive dashboard. Powerful for debugging, analysis and more. Integration takes 2 minutes Change your base URL and add two headers. That's it. from openai import OpenAI client = OpenAI( api_key="sk-your-openai-key", base_url="https://proxy.raptordata.dev/v1", default_headers={ "X-Raptor-Api-Key": "rpt_your-key", "X-Raptor-Workspace-Id": "your-workspace-id" } ) # Use normally — nothing else changes response = client.chat.completions.create( model="gpt-5", messages=[{"role": "user", "content": "Hello!"}] ) Streaming works out of the box. Just add stream: True. Technical details - Built in Rust with Axum/Tokio - ONNX embeddings run locally in the binary (no external vectorisation calls) - Total overhead: ~5ms per request - Works with OpenAI, Anthropic, and any OpenAI-compatible API - Full streaming support with mid-stream firewall monitoring Verify it's working Check the response headers: X-Raptor-Cache: miss # "hit" when cached X-Raptor-Latency-Ms: 5 # Raptor overhead X-Raptor-Upstream-Latency-Ms: 450 # AI provider time Make the same request twice. Second time you'll see X-Raptor-Cache: hit and a response time under 10ms. Free tier available, no credit card required. Sign up at https://www.raptordata.dev I'll be here all day answering questions. Tell me what's missing, what's broken, or what would make this useful for your stack.

Raptor Data - Protect, cache and hot patch your LLM APIs. Built in Rust.

Replies