How do you tell a real regression from model noise when replaying a run?

When you replay or fork a run in Retrace, the steps before the fork come from the recording, but everything after runs live against the model. So two runs of the same input rarely match exactly, even when nothing actually broke.

That makes the useful question harder than it sounds: when a replay diverges, is it a real regression from your change, or just provider non-determinism? Retrace currently shows a first-divergence diff and a verdict of improved, regressed, or unchanged, but I would like to hear how others handle it. What tolerance do you use in practice, and would you rather see a strict step-by-step diff or a semantic comparison of each step?

4 views

How do you tell a real regression from model noise when replaying a run?

Replies