Open-source AI agent monitoring platform. Latitude automatically detects all the ways your agents fail at scale, and gives your coding agent the tools to fix it.
Replies
Best
For agent systems with non-deterministic outputs, how do you define failure in a way that's consistent enough to monitor reliably at scale?
Report
This is a super clean approach to agent observability! Triage is a nightmare when you're just staring at a massive, unorganized stream of logs. Grouping traces into auto-clustered issue datasets makes finding where a trajectory went wrong way faster.
How does Latitude handle automated regression testing once a fix for a specific trace issue is pushed?
Report
The MCP-into-the-coding-agent piece is the clever bit, and underrated in the thread so far. Most observability tools die at the dashboard — signals pile up where nobody looks, so failures just rot. Routing the signal to where the fix actually happens (the editor) is the real unlock; detection was never the bottleneck, action was.
One sharp question on that loop: when you auto-generate an eval per signal and hand it to the coding agent to fix against, how do you keep the agent from overfitting to the eval — patching the specific failing cases rather than the underlying behavior, so the cluster 'closes' but the real issue persists? Curious if there's a held-out/regression check or a human-in-the-loop on the generated evals. That's the failure mode I'd worry about most with auto-fix.
Congrats on shipping this — genuinely needed.
Report
Congrats on the launch, Cesar! The "cluster conversations into failure modes" piece is the part I'd get the most from. One question from running agents that deliberately hand off to a human: how does Latitude tell a real failure apart from a correct escalation? In our setup the agent is supposed to stop and route anything sensitive — refunds,
account changes — to a person, so a "drop-off" there is it doing its job, not breaking. Does it learn which escalations are intended vs the agent actually giving up?
Report
Solving one of the most difficult parts when shipping AI agents!!! How to extract bugs, fixes and improvements from your traces...
This team rocks 🚀🤘
Report
Great work! How does this connect back to the development workflow, any process to do evals to validate the issue is actually resolved before deploying?
the framing of agent conversations as qualitative data is really sharp. most teams just look at error rates and latency, but the actual content of what your agent says to users is where the real failure modes hide. curious how you handle the evaluation of subjective quality — like when an agent is technically correct but the response still feels wrong to the user?
Report
The useful bit here is closing the loop from failure mode to a runnable fix, not just another trace dashboard.
For agent monitoring, I’d want each clustered issue to produce a small acceptance case: trigger, tool/write that failed, expected boundary, and proof the fix changed behavior. Is that what the MCP server hands to the coding agent?
Report
Great product Curious, for teams which are running multiple agents across different use cases, does Latitude monitor them all in one dashboard, or does each agent need its own separate set up?
Replies
For agent systems with non-deterministic outputs, how do you define failure in a way that's consistent enough to monitor reliably at scale?
This is a super clean approach to agent observability! Triage is a nightmare when you're just staring at a massive, unorganized stream of logs. Grouping traces into auto-clustered issue datasets makes finding where a trajectory went wrong way faster.
How does Latitude handle automated regression testing once a fix for a specific trace issue is pushed?
The MCP-into-the-coding-agent piece is the clever bit, and underrated in the thread so far. Most observability tools die at the dashboard — signals pile up where nobody looks, so failures just rot. Routing the signal to where the fix actually happens (the editor) is the real unlock; detection was never the bottleneck, action was.
One sharp question on that loop: when you auto-generate an eval per signal and hand it to the coding agent to fix against, how do you keep the agent from overfitting to the eval — patching the specific failing cases rather than the underlying behavior, so the cluster 'closes' but the real issue persists? Curious if there's a held-out/regression check or a human-in-the-loop on the generated evals. That's the failure mode I'd worry about most with auto-fix.
Congrats on shipping this — genuinely needed.
Congrats on the launch, Cesar! The "cluster conversations into failure modes" piece is the part I'd get the most from. One question from running agents that deliberately hand off to a human: how does Latitude tell a real failure apart from a correct escalation? In our setup the agent is supposed to stop and route anything sensitive — refunds,
account changes — to a person, so a "drop-off" there is it doing its job, not breaking. Does it learn which escalations are intended vs the agent actually giving up?
Solving one of the most difficult parts when shipping AI agents!!! How to extract bugs, fixes and improvements from your traces...
This team rocks 🚀🤘
Great work! How does this connect back to the development workflow, any process to do evals to validate the issue is actually resolved before deploying?
StartupBase
Logs vs issues is such a clean way to frame it. Nobody actually reads logs. Failure modes with evals attached is the thing you fix.
AISA AI Skills Test
the framing of agent conversations as qualitative data is really sharp. most teams just look at error rates and latency, but the actual content of what your agent says to users is where the real failure modes hide. curious how you handle the evaluation of subjective quality — like when an agent is technically correct but the response still feels wrong to the user?
The useful bit here is closing the loop from failure mode to a runnable fix, not just another trace dashboard.
For agent monitoring, I’d want each clustered issue to produce a small acceptance case: trigger, tool/write that failed, expected boundary, and proof the fix changed behavior. Is that what the MCP server hands to the coding agent?
Great product
Curious, for teams which are running multiple agents across different use cases, does Latitude monitor them all in one dashboard, or does each agent need its own separate set up?