Automatically detect errors in your AI agents

Start new thread

Atla - Automatically detect errors in your AI agents

Y Combinator

•5mo ago

Atla is the only eval tool that helps you automatically discover the underlying issues in your AI agents. Understand step-level errors, prioritize recurring failure patterns, and fix issues fast–before your users ever notice.

Replies

Best

Atla

Maker

Massively proud of the whole @Atla team for getting us here - it's been a labor of love, and we're finally out there ❤️

We spend all our time thinking about how to diagnose agent failures better, faster & smarter - and we've found the most reliable route to be focussing on recurring failure patterns (to cut through the noise), while keeping an eye out for new ones (to stay on-policy).

I think we've built something pretty cool that attempts to do that, but more importantly we're eager to learn continuously from feedback and make our eval tools better - so that people can make their agents better. Give us a try & let us know what you think!

Report

5mo ago

Atla

Maker

@thelemonbot Pattern king

Report

5mo ago

Knit – Your Virtual Meeting Place

Big congrats to the Atla team on launch!!

Debugging AI agents has always felt like chasing shadows. Not anymore.

What I love most:

Step-level visibility
Pattern clustering
Actionable fixes + integrations with tools like Claude Code make it feel like an engineer is already drafting the PR for you.
And the ability to chat with traces is a total gamechanger. finally a way to ask “what’s really happening here?” and get a real answer, backed by data.

Super excited to see where the roadmap takes it. Congrats again, Roman, Jackson, and team! this is going to be a must-have for anyone building at the frontier of AI.

Report

5mo ago

Atla

Maker

@kkonrad Thanks for your support Konrad 🥜! Happy to see you highlight the chat with traces feature, which the team made a big push to ship for this launch! We want agent builders to not only see critical failures quickly, but also dig deeper into issues that matter most for their own users.

When you chat with traces, you get an answer and a list of relevant traces where that issue is occurring. Excited for people to use this and more in Atla.

Report

5mo ago

First Words - Multilingua

Nice! Really enjoyed the demo. It seems like it can easily surface the cause of errors that took us a long time to debug previously.

Also liked the compare feature as it seems to uncover the different failure modes of models and see the improvements / degradation between experiments.

Excited to implement it and see if then just handing the quick fix to CC will solve the errors. That would be fantastic.

Report

5mo ago

Atla

Maker

Exactly — the core value is in automatically surfacing failure patterns and highlighting what matters, so you don’t drown in noisy logs.

Early tests show Claude Code can already implement fixes quite well. We’re working on making it more reliable by detecting precise failure patterns, which lets coding agents apply targeted fixes and avoid regressions. That way they can iterate quickly through errors.

Report

5mo ago

Instruct

Congrats! Atla is a much needed product - and it's awesome to see this launch.

Report

5mo ago

Atla

Maker

Thanks Matt, appreciate the kind words!

Report

5mo ago

Atla

Maker

Excited to launch Atla 🚀

We built it so agent teams can ship faster, more reliably. Huge shoutout to the team for the grind that got us here. Can’t wait to help make your agents better—curious how you’re debugging today and where we can support!

Report

5mo ago

Very exciting! Have known the Atla team for a while now and they are excellent engineers and researchers :)

Report

5mo ago

Atla

Maker

Thanks Jack for the nice words!

Report

5mo ago

Really smart concept. Using AI to debug AI just makes sense, especially when you're dealing with complex agent behaviors. Way better than trying to manually catch all these edge cases.

Report

5mo ago

Atla

Maker

Completely agree! The way we approach it preserves back-traceability from failure patterns down to the individual spans where they occurred. This also allows to organically build up an evaluation dataset from failure patterns.

Report

5mo ago

Agnes AI

Finally someone has crafted a tool that evals Agents... too many agents nowadays and I believe Atla could be a stress testing tool for them... How does it cater to different scenarios and biz logics?

Report

5mo ago

Atla

Maker

Thank you @cruise_chen! Super important to stress test agents before sending them into the wild.

We've benchmarked our granular LLMJ annotator on many scenarios (customer support, coding agents, browsing etc) but the real adaptiveness comes from aggregating these into failure patterns tailored to each individual agent - rather than generic eval criteria, you see the specific ways in which your agent is misbehaving.

We're already working on the next steps of customizability - which is letting users dynamically shape patterns over time to make them their own, and understanding how different patterns influence specific business metrics of interest!

Report

5mo ago

Congratulations on the launch! 🚀 I’m building AI agents for business workflows, and error detection is always tough. Does Atla only look at LLM outputs, or can it diagnose issues across the whole agent process—including code and APIs? How customizable is the error tracking for unique workflows? Would love to hear if teams use Atla for improving non-LLM agents too.

Report

5mo ago

Atla

Maker

@sneh_shah this is a great question! We currently focus on LLM outputs, which include the LLM tool calls, i.e. the tool call arguments, and also handoffs to other agents. Thus, we assume that the tool outputs are correct, and we leave the intricacies of the tool and tool error handling to the developer, though, we do pick up on how the agent reacts to tool outputs. Systematic issues across agent processes are then highlighted in common failure patterns.

The error tracking is automatically customised to your system message and tool information - thus, we measure how well the agent follows the policy and how well it completes the task that you have specified, rather than have you repeat this information in the evaluation. There is further customisability where individual metrics can be tracked in our custom metrics suite.

Report

5mo ago

Atla looks super helpful in discovering root causes of errors, not just raising alerts.

Report

5mo ago

1 2 3

•••