Lost 4 hours to the most cursed null reference. Here's what happened

by•

Spent a chunk of Saturday on this and felt obligated to share.

The setup: Claude Code wrote a clean, working function that passed every test. Looked perfect. Pushed to staging. Production crashed.

The bug: the agent had quietly used a deprecated method that worked in tests because of a mock. The mock didn't trigger the deprecation warning. The agent had no way to know it was using something on its way out.

The fix took 4 hours not because the code was hard, but because every layer I checked looked right. I had to read the actual SDK changelog to find it.

The lesson: AI-written code that passes tests can still be subtly wrong against the runtime. Tests are necessary, not sufficient.

Anyone else have a 'looks right, breaks in prod' AI debugging story? Misery loves company.

15 views

Add a comment

Replies

Best

Been there. One of the trickiest AI-generated bugs I hit was code that passed unit tests but failed under real production data because an API response shape had changed. The mocks were too clean, so everything looked correct until actual traffic hit it. Definitely reinforced the idea that tests validate assumptions—but production validates reality.

Same here. Passes tests ≠ safe in prod. I now always sanity-check against real SDK behavior.