What building an AI incident agent actually taught us
We've spent the last 6 months pointing an agent at real production errors. Going in, we were sure the hard problem was "can an AI write the fix." Almost everything that mattered turned out to be somewhere else. Three surprising things we learned:
1. Most errors were never worth a page
Across real traffic, ~70% of errors triage out as noise. Which means they are not actionable and no human is needed. This rate holds surprisingly consistently across tenants and services. I still remember integrating with our first design partner. First Slack notification comes in, the team is high-fiving, pure excitement. This thing actually works! Next thing we know our Slack inbox blows up and a burst of 12 notifications arrive. So we built a relevance gate to make sure notifications only happen if something truly breaks. We set out to fix bugs and discovered the bigger win was the 70% of times we give time back.
