Dutchman Labs - Eval Studio - Test Your Agents Faster
by•
Speed up your testing and agent validation
Replies
Best
curious how the test runner handles non-determinism. agents give different outputs on identical inputs - that is not a bug, but it breaks most eval frameworks expecting stable assertions.
Report
Maker
@mykola_kondratiuk well it's a balance, did the trajectory of expected functions ultimately walk down a path you want for the agents. i.e did the agent run a google search --then--> click a link --> do work on the site or did they just get to a domain and do the work. Both may evaluate to good but one is the happy desirable path
Replies
curious how the test runner handles non-determinism. agents give different outputs on identical inputs - that is not a bug, but it breaks most eval frameworks expecting stable assertions.
@mykola_kondratiuk well it's a balance, did the trajectory of expected functions ultimately walk down a path you want for the agents. i.e did the agent run a google search --then--> click a link --> do work on the site or did they just get to a domain and do the work. Both may evaluate to good but one is the happy desirable path