Respan Gateway - One AI gateway with built-in observability and evals
by•
Respan AI Gateway connects your app to 1,000+ AI models through one endpoint.
But routing is the easy part. Respan keeps production AI reliable and under control with fallbacks, retries, caching, spend limits, alerts, and full traces for every call.
Gateway, observability, evals, prompt management, monitors, and cost controls all run on one platform, so you do not need to stitch together five tools to debug production.


Replies
Having caching and fallbacks baked into one endpoint is a massive win for customer-facing AI features like conversational marketing bots. How does the gateway handle latency during failovers? Is the switch seamless enough that the end-user won't notice a lag?
Respan
@andika_fadhilah this matters a lot!
For failovers, there is usually a small retry / routing latency, since the gateway needs to detect the provider issue and move the request to another live provider. But in most cases, the end user usually does not notice much beyond a slightly slower response.
I don't work in AI infra but even from the outside, the "something broke and you don't know why" problem makes total sense. having one place to see what's happening instead of piecing it together sounds like it saves a lot of pain. congrats on the launch.
Respan
@sidraarifali That’s exactly the pain we hear from teams. The first version usually works fine, but once real users, providers, prompts, and costs are involved, it gets messy fast.
DIY UX Test
Putting evals at the gateway layer instead of bolting them on downstream is a smart place to catch regressions before they reach prod. Does Respan run evals against live traffic samples, or is it more of a pre-deploy gate?
Respan
@oleksii_sekundant we supports both!
Teams can run evals as a pre-deploy gate before rolling out a new prompt, model, or workflow version. That helps catch regressions before they hit production.
They can also run evals on live traffic samples, so you can monitor quality over real user behavior instead of only testing against static cases.
Congrats on the launch! Genuine question from someone running multi-provider LLM calls in production: when a provider degrades mid-request (slow but not erroring), does the gateway support latency-based failover, or only hard-error fallback? And can the cost observability enforce per-provider daily caps, or is it reporting-only? The eval layer baked into the gateway is the part I haven't seen elsewhere — curious how you keep eval prompts from polluting the usage metrics.
Respan
@mikebrandswarm Great question!
Today, we support hard-error fallback, and latency-based failover is in the pipeline. For slow-but-not-erroring providers, we know this is a real production issue, so we’re designing it around configurable latency thresholds and safe handoff behavior.
On cost, it is not reporting-only. We support both soft caps and hard caps. Soft caps can trigger Slack / email alerts, while hard caps can block requests based on the settings you configure per API key, route, or provider.
For evals, we separate eval traffic from production traffic with metadata / tags / environments, so eval prompts can be traced and analyzed without polluting normal usage metrics like customer usage, token volume, latency, or production cost reporting.
Humalike
Interesting take with Respan: Self-driving AI observability and evals for agents. What made you decide to build this now?
Respan
@borrellbr We started with observability because that’s the first major pain teams hit in production. Once real users are making LLM calls, you need to know what happened, which model was used, why something failed, and where cost or latency is coming from.
Evals became the natural next step because once you have the traces and data, you can do more than just look back manually. You can start checking quality, regressions, and failures proactively.
That’s also why we’re moving toward more self-driving observability. Teams should not have to open the dashboard every day just to find problems. The platform should surface the important issues, run checks, and help teams catch things before they become bigger production problems.
Conduit AI
Incredible team and product!
Respan
Thanks for the support@punn_kam . Appreciated!
Huge fan of the routing and spend-limiting features so far.
It really bridges the gap between a standard API router and a full-scale LLMops production platform.
Having traces baked in makes managing live traffic so much cleaner.
Respan
@kevin_huang_ynng_ Thanks so much! Glad to hear the routing and spend limiting features are hitting the mark. That gap between a simple API router and full scale LLMops is exactly what we were aiming to solve. Having traces baked in definitely helps keep live traffic manageable and its great to hear itss making a difference on your end.
The Prompting Company
Congrats on the launch!!
Respan
Thanks for the support@michelle_marcelline . Appreciated!
Respan
Thanks for the comment@hamza_afzal_butt. For niche or low volume keywords, Keywords AI tries to go beyond just raw search data. It looks at semantic relevance, context and related intent to surface opportunities that traditional tools might miss. The idea is to give actionable insights even when volume is low so you can still target terms that have real potential.
Protaigé
Good stuff however I do not think routing is the easy part. It's only easy if it's not done properly. Routing needs to figure out best model. Best model needs to define criteria for 'best'. If it's best output + speed + price, then routing needs to detect intent behind what's flowing through it and adjust accordingly.
Respan
@ali_shaheen Totally get what you’re saying. Routing can feel deceptively simple until you start factoring in output quality, speed and cost. Detecting intent accurately and dynamically adjusting to pick the right model is really where it gets tricky. That balance is something we’ve been thinking a lot about with Keywords ai making it smart enough to choose the best model for the task without slowing things down or driving up cost.