Launching today

agentrial

Launching today

Run your AI agent 20x. Get confidence intervals, not vibes.

1 follower

Run your AI agent 20x. Get confidence intervals, not vibes.

1 follower

Visit website

Your AI agent passed the test. But would it pass again? LLMs are non-deterministic — the same task can fail 30% of the time on the next run. agentrial runs each test case N times and gives you confidence intervals instead of pass/fail. Wilson CI on pass rates, failure attribution via Fisher exact test, real API cost tracking, CI/CD regression detection. Works with LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, any Python callable. YAML config, MIT license.

Free

Launch tags:Open Source•Developer Tools•Artificial Intelligence

Launch Team / Built With

Unblocked AI Code Review — High-signal comments based on your team's context

High-signal comments based on your team's context

Promoted

Maker

📌

Hey everyone! I built agentrial because I was going crazy with my LangGraph agents randomly failing on the same prompts.

The core insight: treating agent evaluation as a statistical problem, not a deterministic one. Every test runs N times. You get confidence intervals, not "it passed once." When something fails, Fisher exact test tells you exactly which step is the bottleneck.

v0.2.0 just shipped with:

→ 438 tests, 15 CLI commands

→ 6 framework adapters + OpenTelemetry

→ Agent Reliability Score (0-100 composite metric)

→ VS Code extension (live on Marketplace)

→ MCP security scanner for 6 vulnerability classes

→ Production drift detection (CUSUM, Page-Hinkley, KS test)

It's free and local-first — your prompts and data never leave your machine.

Would love your feedback, especially on what metrics matter most for your agent workflows.

Report

1d ago

Newsletter Apps About FAQ Terms Privacy & Cookies Privacy Choices Advertise llms.txtContact us: hello@producthunt.com

agentrial

Run your AI agent 20x. Get confidence intervals, not vibes.

Run your AI agent 20x. Get confidence intervals, not vibes.

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads

Engineering & Development

LLMs

Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

AI Agents

Trending categories

Top reviewed

Trending products

Top forum threads