Respan makes it dead simple to build production-ready LLM applications. With 2 lines of code, developers get a complete DevOps platform that speeds up monitoring & evaluate AI apps.
This is the 4th launch from Respan. View more
Respan Gateway
Launched this week
Respan AI Gateway connects your app to 1,000+ AI models through one endpoint.
But routing is the easy part. Respan keeps production AI reliable and under control with fallbacks, retries, caching, spend limits, alerts, and full traces for every call.
Gateway, observability, evals, prompt management, monitors, and cost controls all run on one platform, so you do not need to stitch together five tools to debug production.





Free Options
Launch Team





Respan
Hi Product Hunt,
We built Respan AI Gateway because routing to more models is only the first step.
Once your AI product is in production, the harder questions show up fast:
What happens when a provider fails?
Which customer is driving cost?
Which model version caused the latency spike?
Did the fallback work?
How do we trace, evaluate, and control everything without stitching together five tools?
Respan Gateway gives teams one OpenAI- and Anthropic-compatible endpoint for 1,000+ models, with fallbacks, retries, caching, spend limits, alerts, traces, evals, prompt management, and monitors on the same platform.
The goal is simple: make production AI easier to ship, debug, and control.
Would love your feedback, questions, and support today!
@fran3cc Congrats on the launch! For teams that already have a multi-provider setup, what’s the simplest, lowest-risk way to try Respan Gateway in production what safeguards do you provide to ensure no customer-facing downtime or cost surprises during the transition?
Respan
@swati_paliwal Thanks Swati!
For teams with an existing multi-provider setup, the lowest-risk path is to start with one provider or one route first, then gradually expand traffic once everything looks good.
For customer-facing downtime, Respan Gateway can automatically fall back to another live provider when the primary provider fails, so requests do not just break at the first failure.
For cost surprises, teams can set per-API-key spend limits, caps, or blocking rules. Cost and usage changes can also be reported through Slack or email alerts, so teams know quickly before spend gets out of control.
AISA AI Skills Test
@fran3cc the evals piece is what makes this interesting to me. most teams have routing and fallbacks figured out, but almost nobody has a real answer for "how do I know this model is actually performing well in production" beyond eyeballing logs. curious how you handle eval drift over time as user inputs shift.
Respan
@ozandag Absolutely that’s exactly why we built the evals feature, it’s easy to set up routing and fallbacks but knowing how a model is actually performing in production is a different story. With Keywords ai, we continuously monitor outputs against intent and quality benchmarks so you can catch drift early. The system adapts as inputs shift, helping teams spot issues before they show up in logs. It’s still early but so far its been a huge help for keeping performance consistent.
the DevOps platform framing alongside 2 lines of code is an interesting positioning tension. DevOps platforms usually require significant setup and ongoing configuration to be useful. 2 lines of code implies you get value immediately. curious which one is more accurate for a new user and at what point the simple integration becomes a platform with enough configuration to actually catch the problems that matter in production
Respan
@ansari_adin Totally get what you’re pointing out. The 2 lines of code are meant to get new users immediate value without the heavy setup, but once you start feeding live traffic and want deeper insights, Keywords AI naturally scales into more of a platform. That’s when configuration—like custom routing, spend limits, and eval tracking—starts to matter, helping catch the production issues that simple integration alone can’t surface.
Congrats on the launch! Genuine question from someone running multi-provider LLM calls in production: when a provider degrades mid-request (slow but not erroring), does the gateway support latency-based failover, or only hard-error fallback? And can the cost observability enforce per-provider daily caps, or is it reporting-only? The eval layer baked into the gateway is the part I haven't seen elsewhere — curious how you keep eval prompts from polluting the usage metrics.
Respan
@mikebrandswarm Great question!
Today, we support hard-error fallback, and latency-based failover is in the pipeline. For slow-but-not-erroring providers, we know this is a real production issue, so we’re designing it around configurable latency thresholds and safe handoff behavior.
On cost, it is not reporting-only. We support both soft caps and hard caps. Soft caps can trigger Slack / email alerts, while hard caps can block requests based on the settings you configure per API key, route, or provider.
For evals, we separate eval traffic from production traffic with metadata / tags / environments, so eval prompts can be traced and analyzed without polluting normal usage metrics like customer usage, token volume, latency, or production cost reporting.
2 lines of code complete DevOps platform. always sounds a bit too good 😄
What breaks first when you try to use it on a real production agent with tool calls and long traces?
Respan
@workout097_collab Totally fair. The 2 lines are for getting traffic into the gateway and traces showing up, not pretending production agents are easy.
In real agents, the first things that break are rate limits, long tool-call chains, cost spikes, and not knowing which model/tool/prompt version caused the issue.
This is what many dev teams are missing. I’ve seen so many projects stall because they couldn’t effectively trace which model version caused a latency spike.
How does Respan handle 'evals' for non-deterministic outputs? Is it easy to set up automated regression tests for prompt changes?
Respan
@diana_nadim2 Hi,
For non-deterministic outputs, we don’t rely only on exact-match evals. Teams can evaluate outputs with a mix of LLM judges, rubric-based scoring, semantic checks, structured/schema checks, and custom pass/fail criteria depending on the task.
For prompt changes, yes, the goal is to make regression testing easy. You can keep a dataset of representative inputs, run a new prompt or model version against the same cases, compare scores against the previous version, and catch quality, latency, or cost regressions before rolling it out!
Having caching and fallbacks baked into one endpoint is a massive win for customer-facing AI features like conversational marketing bots. How does the gateway handle latency during failovers? Is the switch seamless enough that the end-user won't notice a lag?
Respan
@andika_fadhilah this matters a lot!
For failovers, there is usually a small retry / routing latency, since the gateway needs to detect the provider issue and move the request to another live provider. But in most cases, the end user usually does not notice much beyond a slightly slower response.
DIY UX Test
Putting evals at the gateway layer instead of bolting them on downstream is a smart place to catch regressions before they reach prod. Does Respan run evals against live traffic samples, or is it more of a pre-deploy gate?
Respan
@oleksii_sekundant we supports both!
Teams can run evals as a pre-deploy gate before rolling out a new prompt, model, or workflow version. That helps catch regressions before they hit production.
They can also run evals on live traffic samples, so you can monitor quality over real user behavior instead of only testing against static cases.