What should Codex prove before it edits production iOS code?

I am building ShipGuard around one rule: agent speed is great, but risky iOS work needs a proof lane.

For me the line changes by surface:

- build/run for compile and obvious integration

- logs/debugger for runtime behavior

- simulator for flows

- profiler for performance claims

- device/TestFlight/App Store/manual review when local proof is not enough

Curious where you draw the line. When is a unit test enough, and when do you make the agent bring simulator or device receipts?

7 views