When running AI agents in production, what's the one thing that breaks the most often?

by•10d ago

Curiosity from talking to enough teams that I want to see if the pattern holds here.

For folks running agents in production right now, what fails the most often:

- Agent goes off-script and produces something unexpected

- Integration with a connected system silently breaks

- Token cost spikes from a runaway loop

- Compliance or audit issue surfaces after the fact

- Something else entirely

Honest poll, no judgment. The answer is rarely the model, that's the part everyone tests. It's almost always something boring nearby."

6 views

Replies

Best

The boring nearby thing I’d pick is context contract drift.

The model may behave, but the world around it changes: a CRM field gets renamed, a policy doc is stale, a Slack thread overrides the plan, or the agent quietly starts treating an inference like a fact. Then the failure shows up as “off-script” even though the root cause was source/permission/recency.

For production agents, I’d want every run to answer: what context did you trust, what changed since last run, and which assumption was weakest?

Report

9d ago

@jim_jeffers agreed with you

Context contract drift is such a good name for it. We had exactly this happen, a policy doc got updated but the agent was still pulling from a cached version. It passed every test because technically the agent was doing what it was told, just based on stale info. Nobody noticed for two weeks until a client flagged an inconsistency.

The "what changed since last run" question is something we ended up building into our monitoring after that incident. Simple diff check on source documents before each run. Not elegant but it caught three more drift issues in the first month alone. Feels like this should be standard in every agent framework tbh.

Report

9d ago