Forums
Where SLMs beat GPT-5
We ve been seeing a consistent pattern across agent systems:
GPT-5 works well as a judge on average cases
but breaks down on edge cases and policy boundaries.
That s exactly where reliability matters.
In our recent work, we took a different approach:

