Launched this week

agentrial
Run your AI agent 20x. Get confidence intervals, not vibes.
3 followers
Run your AI agent 20x. Get confidence intervals, not vibes.
3 followers
Your AI agent passed the test. But would it pass again? LLMs are non-deterministic — the same task can fail 30% of the time on the next run. agentrial runs each test case N times and gives you confidence intervals instead of pass/fail. Wilson CI on pass rates, failure attribution via Fisher exact test, real API cost tracking, CI/CD regression detection. Works with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, any Python callable. YAML config, MIT license.




Free
Launch Team / Built With

Wispr Flow: Dictation That Works Everywhere — Stop typing. Start speaking. 4x faster.
Stop typing. Start speaking. 4x faster.
Promoted
Be the first to comment



