Narev - Find a faster and cheaper LLM in minutes

by
Our objective was to set an A/B test and see the results in 5 minutes. We hit it. We worked hard to make setting up a benchmark easier by 1. improving model search 2. adding smart model recommendation engine 3. adding support for publicly sharing the results

Add a comment

Replies

Best
Maker
📌

What is Narev? BYO benchmark for LLMs (replacement of evals).

What does Narev do? Gives an answer to "What's the best model for my XYZ use case?" (spoiler alert - no one knows, not even the LLM leaderboards).

How does Narev do it? We help you set up a quick benchmark on YOUR OWN data.

We already do A/B testing for product (LaunchDarkly) and marketing (Mailchimp, Hubspot), so let's do it for LLMs.

To get started you integrate through:

  • import of your traces (we've pretty much every major provider)

  • calling our API (OpenAI API endpoint)

  • file import (json, jsonl, csv)

  • manual entry (yes, just type)

Here is our blog post on GPT 3.5 beating GPT 5 in structured extraction to get you inspired

If going for the API endpoint, here are some guides:

import requests

NAREV_API_KEY = "<YOUR API KEY>"
NAREV_BASE_URL = "https://www.narev.ai/api/applica... ID>/v1"
STATE_OF_ART_MODEL="openrouter:google/gemini-3-pro-preview"

messages = []

response = requests.post(
    url=f"{NAREV_BASE_URL}/chat/completions",
    headers={
        "Authorization": f"Bearer {NAREV_API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": STATE_OF_ART_MODEL,
        "messages": messages
    },
    timeout=60
)

For quick experiments - just enter the prompts manually, through the UI:

If going with the tracing, Narev integrates with @Helicone AI @Langfuse @LangSmith and @W&B Weave by Weights & Biases

More on tracking in our docs