Riyad Sarsour

Dutchman Labs - Eval Studio - Test Your Agents Faster

by
Speed up your testing and agent validation

Add a comment

Replies

Best
Mykola Kondratiuk

curious how the test runner handles non-determinism. agents give different outputs on identical inputs - that is not a bug, but it breaks most eval frameworks expecting stable assertions.

Riyad Sarsour

@mykola_kondratiuk well it's a balance, did the trajectory of expected functions ultimately walk down a path you want for the agents. i.e did the agent run a google search --then--> click a link --> do work on the site or did they just get to a domain and do the work. Both may evaluate to good but one is the happy desirable path