Self-driving AI observability and evals for agents

Start new thread

Respan Gateway - One AI gateway with built-in observability and evals

Y Combinator

•22d ago

Respan AI Gateway connects your app to 1,000+ AI models through one endpoint. But routing is the easy part. Respan keeps production AI reliable and under control with fallbacks, retries, caching, spend limits, alerts, and full traces for every call. Gateway, observability, evals, prompt management, monitors, and cost controls all run on one platform, so you do not need to stitch together five tools to debug production.

Replies

Best

Having caching and fallbacks baked into one endpoint is a massive win for customer-facing AI features like conversational marketing bots. How does the gateway handle latency during failovers? Is the switch seamless enough that the end-user won't notice a lag?

Report

22d ago

Respan

Maker

@andika_fadhilah this matters a lot!

For failovers, there is usually a small retry / routing latency, since the gateway needs to detect the provider issue and move the request to another live provider. But in most cases, the end user usually does not notice much beyond a slightly slower response.

Report

22d ago

💡 Bright idea

I don't work in AI infra but even from the outside, the "something broke and you don't know why" problem makes total sense. having one place to see what's happening instead of piecing it together sounds like it saves a lot of pain. congrats on the launch.

Report

22d ago

Respan

Maker

@sidraarifali That’s exactly the pain we hear from teams. The first version usually works fine, but once real users, providers, prompts, and costs are involved, it gets messy fast.

Report

22d ago

DIY UX Test

Putting evals at the gateway layer instead of bolting them on downstream is a smart place to catch regressions before they reach prod. Does Respan run evals against live traffic samples, or is it more of a pre-deploy gate?

Report

22d ago

Respan

Maker

@oleksii_sekundant we supports both!

Teams can run evals as a pre-deploy gate before rolling out a new prompt, model, or workflow version. That helps catch regressions before they hit production.

They can also run evals on live traffic samples, so you can monitor quality over real user behavior instead of only testing against static cases.

Report

22d ago

Congrats on the launch! Genuine question from someone running multi-provider LLM calls in production: when a provider degrades mid-request (slow but not erroring), does the gateway support latency-based failover, or only hard-error fallback? And can the cost observability enforce per-provider daily caps, or is it reporting-only? The eval layer baked into the gateway is the part I haven't seen elsewhere — curious how you keep eval prompts from polluting the usage metrics.

Report

22d ago

Respan

Maker

@mikebrandswarm Great question!

Today, we support hard-error fallback, and latency-based failover is in the pipeline. For slow-but-not-erroring providers, we know this is a real production issue, so we’re designing it around configurable latency thresholds and safe handoff behavior.

On cost, it is not reporting-only. We support both soft caps and hard caps. Soft caps can trigger Slack / email alerts, while hard caps can block requests based on the settings you configure per API key, route, or provider.

For evals, we separate eval traffic from production traffic with metadata / tags / environments, so eval prompts can be traced and analyzed without polluting normal usage metrics like customer usage, token volume, latency, or production cost reporting.

Report

22d ago

Humalike

Interesting take with Respan: Self-driving AI observability and evals for agents. What made you decide to build this now?

Report

22d ago

Respan

Maker

@borrellbr We started with observability because that’s the first major pain teams hit in production. Once real users are making LLM calls, you need to know what happened, which model was used, why something failed, and where cost or latency is coming from.

Evals became the natural next step because once you have the traces and data, you can do more than just look back manually. You can start checking quality, regressions, and failures proactively.

That’s also why we’re moving toward more self-driving observability. Teams should not have to open the dashboard every day just to find problems. The platform should surface the important issues, run checks, and help teams catch things before they become bigger production problems.

Report

22d ago

Conduit AI

Incredible team and product!

Report

22d ago

Respan

Maker

Thanks for the support@punn_kam . Appreciated!

Report

22d ago

Huge fan of the routing and spend-limiting features so far.
It really bridges the gap between a standard API router and a full-scale LLMops production platform.
Having traces baked in makes managing live traffic so much cleaner.

Report

22d ago

Respan

Maker

@kevin_huang_ynng_ Thanks so much! Glad to hear the routing and spend limiting features are hitting the mark. That gap between a simple API router and full scale LLMops is exactly what we were aiming to solve. Having traces baked in definitely helps keep live traffic manageable and its great to hear itss making a difference on your end.

Report

22d ago

The Prompting Company

Congrats on the launch!!

Report

22d ago

Respan

Maker

Thanks for the support@michelle_marcelline . Appreciated!

Report

22d ago

🔌 Plugged in

How does Keywords AI handle niche or low-volume keywords differently than other tools?

Report

22d ago

Respan

Maker

Thanks for the comment@hamza_afzal_butt. For niche or low volume keywords, Keywords AI tries to go beyond just raw search data. It looks at semantic relevance, context and related intent to surface opportunities that traditional tools might miss. The idea is to give actionable insights even when volume is low so you can still target terms that have real potential.

Report

22d ago

Protaigé

💎 Pixel perfection

Good stuff however I do not think routing is the easy part. It's only easy if it's not done properly. Routing needs to figure out best model. Best model needs to define criteria for 'best'. If it's best output + speed + price, then routing needs to detect intent behind what's flowing through it and adjust accordingly.

Report

22d ago

Respan

Maker

@ali_shaheen Totally get what you’re saying. Routing can feel deceptively simple until you start factoring in output quality, speed and cost. Detecting intent accurately and dynamically adjusting to pick the right model is really where it gets tricky. That balance is something we’ve been thinking a lot about with Keywords ai making it smart enough to choose the best model for the task without slowing things down or driving up cost.

Report

22d ago

1 2 3