Launching today
Rippletide Eval CLI

Rippletide Eval CLI

Rippletide CLI is an evaluation tool for AI agents

127 followers

Rippletide CLI is an interactive terminal tool to evaluate AI agent endpoints directly from your command line. It generates questions from the agent’s knowledge, supports predefined questions for reproducible benchmarking, and delivers clear hallucination KPIs. Get instant feedback on performance with real time progress, automatic evaluation, and detailed reports.
Rippletide Eval CLI gallery image
Rippletide Eval CLI gallery image
Rippletide Eval CLI gallery image
Rippletide Eval CLI gallery image
Free
Launch Team
AssemblyAI
AssemblyAI
Build voice AI apps with a single API
Promoted

What do you think? …

Salim Boujaddi
Hunter
📌

As an early engineer at Rippletide, I've spent countless hours testing AI agents and getting frustrated with all the vague performance metrics.

That's why we built Rippletide CLI: a terminal tool that lets you benchmark your AI agent directly from the command line. It generates questions from the agent's own knowledge, supports reproducible test sets, and gives clear KPIs on hallucinations.

Everything runs in real time with automatic evaluation and detailed reports, so you actually see where your agent struggles.

Would love to hear what the PH community thinks and get feedback from fellow AI builders 🚀

Patrick Joubert

@imbjdd Congrats on the launch! 🚀
Testing agents in the terminal is such a breath of fresh air compared to heavy UI dashboards. The fact that it generates questions from the agent's own knowledge base is a huge time-saver for building test sets.

We’ve all been through the frustration of 'vague metrics,' so seeing clear hallucination KPIs directly in the CLI is a massive win for the workflow. Can't wait to try this out on our latest endpoints!
Congrats again to the team! @yann_bilien

Austin Heaton

@imbjdd  @yann_bilien  @patricksclouds 

congrats on the launch! What if you want to eval something that is outside of the template in the dashboard? How does it work then? Or mostly focused on the terminal?

Yann BILIEN

@austin_heaton Thanks! If you want to evaluate a custom agent or something outside the dashboard templates, the CLI is the way to go. Here’s a guide to get started: https://docs.rippletide.com/docs/cli_guide 

Hope you like it

Yann BILIEN

Hi all, very excited to present an Agent evaluation module today!

As AI engineers, my team and I struggled to reliably tell whether the latest version of an agent was actually performing well or not.
So we built a module to evaluate agents, and we’re open-sourcing the hallucination measurement part.

How it works:

1 – Connect your agent
Use our CLI to provide your agent endpoint (localhost works).
Connect the data your agent needs. Today, we support PostgreSQL databases, internal APIs, and Pinecone as a vector store. If you’d like to add a new source, feel free to open a PR on the repo.

2 – Launch tests
Tests are automatically generated to evaluate your agent’s behavior and make sure no agent possible wrong behavior nothing is left out - stay safe. You can also add your own test set if needed.

3 – Understand what failed
For each test question, we check every fact in the agent’s answer and verify whether it has a reference in our graph. We then explain where additional data is needed to improve your agent.

You can then improve the agent on the Rippletide platform and re-test it. We believe that when an agent reaches less than 1% hallucinations, it can be deployed in production. Some use cases require 0.1% or even 0.01%, depending on volume or industry.

Feel free to ask any questions, or reach out if you’d like to know more about what we’re building.

Cheers,
Yann

Samet Sezer

does the benchmarking feature allow us to compare historical runs side-by-side to track drift over time?

Yann BILIEN

Hi samet_sezer, we didn't implement that yet - would that make sense to you?

Tell me in the meantime if you'd like to export locally your reports

Samet Sezer

@y__b It would be a huge value add. without historical comparison, it's hard to prove to stakeholders that the model is actually improving. I'll take the local export

PE Lieb

Congratulations on the launch 🚀

Salim Boujaddi

@pedward_lieb Thanks a lot for the support! 🙏

jiawei liu

Congrats on the launch! A lof of the times, the evaluation metrics are heavily related to the use cases, will the generated questions and evals be specially tuned for specific scenarios/ industries?

Curious Kitty
A lot of eval stacks lean on LLM-as-a-judge and people struggle with score variance and trust: what is Rippletide’s core approach to scoring hallucinations, and how do you handle the hardest case—when the agent’s answer is partially correct, partially unsupported, or the “truth” isn’t explicitly in the knowledge source?