Narev - Find a faster and cheaper LLM in minutes
by•
Our objective was to set an A/B test and see the results in 5 minutes.
We hit it.
We worked hard to make setting up a benchmark easier by 1. improving model search 2. adding smart model recommendation engine 3. adding support for publicly sharing the results


Replies
What is Narev? BYO benchmark for LLMs (replacement of evals).
What does Narev do? Gives an answer to "What's the best model for my XYZ use case?" (spoiler alert - no one knows, not even the LLM leaderboards).
How does Narev do it? We help you set up a quick benchmark on YOUR OWN data.
We already do A/B testing for product (LaunchDarkly) and marketing (Mailchimp, Hubspot), so let's do it for LLMs.
To get started you integrate through:
import of your traces (we've pretty much every major provider)
calling our API (OpenAI API endpoint)
file import (json, jsonl, csv)
manual entry (yes, just type)
Here is our blog post on GPT 3.5 beating GPT 5 in structured extraction to get you inspired
If going for the API endpoint, here are some guides:
How to choose an LLM with a Kaggle and Google Colab notebook
How to choose an LLM (if your data is labelled) with a Kaggle and Google Colab notebook
For quick experiments - just enter the prompts manually, through the UI:
If going with the tracing, Narev integrates with @Helicone AI @Langfuse @LangSmith and @W&B Weave by Weights & Biases
More on tracking in our docs