Most LLM calls in production are repeats. Same questions, same prompts, sometimes worded slightly differently. SemanticGuard caches them. Sits between your app and OpenAI/Anthropic/Google, returns cache hits in <50ms, cuts costs 40-70%. One line of code to install. Shadow Mode shows your savings before you flip caching on. Every hit validated by your own AI so you never serve a wrong answer.

Built this because I was watching our own LLM bills climb. Most of our traffic was repeats: the same question worded differently, the same content-generation prompt with different inputs, the same lookup coming back the next day. Provider-side prompt and caching only fires within minutes on byte-identical prefixes, so it caught maybe a tenth of the waste. So I built a gateway that understands when two requests mean the same thing. The catch with semantic caching is correctness: if you serve a wrong answer once, trust is gone. So every cache hit is judged by your own cheapest model before it goes out. Failures get flagged automatically. Integration was the other design constraint. If it takes more than one line, no one tries it. So it's just fetch: withSemanticGuard() in your AI SDK config. Shadow Mode lets you see your savings without serving any cached responses and flip caching on when you trust the numbers. Would love feedback from anyone running LLMs in production, especially where the validation layer falls short.

This is such a neat idea, Shadow Mode especially. Really lowers the barrier to just trying it out.

One thing I'm curious about though, how does it handle queries that are semantically close but mean the opposite? Like "which foods are good for high blood pressure" vs "which foods should I avoid for high blood pressure" these would probably sit pretty close in embedding space but serve completely different answers. Does the validator catch that, or is this a known edge case you're still working on?

@amit_kamat1 it's a known case and requires to use the intent as gatekeeper. good catch!

This is such a neat idea, Shadow Mode especially. Really lowers the barrier to just trying it out.

@amit_kamat1 it's a known case and requires to use the intent as gatekeeper. good catch!

SemanticGuard

Cuts your LLM API costs by 40-70%. One line of code.

Cuts your LLM API costs by 40-70%. One line of code.