All activity
Antoine Nguyenleft a comment
Really excited to share Parity! We kept running into the same frustrating loop: we’d make a small prompt or harness change, run evals, and still not feel confident that the intended behavior actually changed or know where it started to break. That gap between “we changed something”, "We think it worked (vibe checks) and “we know what actually changed” was the thing we wanted to fix. Would love...

Parity: Auto-evals for harness changesCatch AI behavior changes before they ship
Parity helps agent teams verify that prompt and harness changes actually changed behavior.
It monitors PRs for behavior-defining changes, identifies what changed, checks existing eval coverage, and generates targeted probe evals to test whether the new behavior shows up and where it stops holding.
Built for teams who want something faster and more reliable than manual spot checks and vibe testing.

Parity: Auto-evals for harness changesCatch AI behavior changes before they ship
