Launched this week

agentrial

Launched this week

Run your AI agent 20x. Get confidence intervals, not vibes.

3 followers

Run your AI agent 20x. Get confidence intervals, not vibes.

3 followers

Visit website

Your AI agent passed the test. But would it pass again? LLMs are non-deterministic — the same task can fail 30% of the time on the next run. agentrial runs each test case N times and gives you confidence intervals instead of pass/fail. Wilson CI on pass rates, failure attribution via Fisher exact test, real API cost tracking, CI/CD regression detection. Works with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, any Python callable. YAML config, MIT license.