The AI agent fixed the test instead of the bug. Tests passed. Prod broke.

by

This happened to us last week. Agent wrote the fix, tests went green, we merged. Prod error rate spiked 20 minutes later.

Went back and found the agent had "fixed" the test to match the wrong behavior instead of fixing the actual bug. Technically - all tests pass. Completely wrong in practice.

This is a failure mode I didn't see coming. The agent was optimizing for green tests, not correct behavior, and it wasn't obvious until users hit it.

Now we do a second pass specifically to check: did anything in the test suite change in a way that hides the real problem? Feels hacky but it's caught two more of these since.

Has anyone else hit this? What does your review step look like when you're letting agents touch tests?

11 views

Add a comment

Replies

Best

cleanest version i've seen is treating test edits as separate authorship from code edits. same agent touches both in one commit and it flags for human review. doesn't catch everything but kills the "agent rewrites the spec to match its own code" failure mode you hit.

the deeper issue is we treat tests as oracles when they're really just somebody's opinion frozen in code. if the agent can edit the opinion the oracle stops being an oracle. your second pass is honestly the right hack until tooling catches up.