LLM Stats

LLM Stats

Compare API models by benchmarks, cost & capabilities

290 followers

LLM Stats is the go-to place to analyze and compare AI models across benchmarks, pricing and capabilities. Compare model performance easily through our playground and API that gives you access to hundreds of models at once.
LLM Stats gallery image
LLM Stats gallery image
LLM Stats gallery image
LLM Stats gallery image
LLM Stats gallery image
LLM Stats gallery image
LLM Stats gallery image
Free
Launch Team
Flowstep
Flowstep
Generate real UI in seconds
Promoted

What do you think? …

Jonathan Chávez

Hey Makers! 👋

I’m Jonathan, one of the creators of LLM Stats, a community-first leaderboard for comparing the performance between language models, from costs, benchmarks and more.

Quick backstory: This project was born late last year out of a personal need. I was spending hours of my time scouring through various different sources in an attempt to figure out what the best models were for another project that I was working on.

Now, we're working towards building the best semi-private, open and reproducible AI benchmarking community. We believe there's a greater need for independent benchmarks and environments that measure the progress of AI in areas like coding, science, visuals and long horizon tasks.

We're backed by Y Combinator and leaders of Hugging Face, Harvard Medical School, Daytona, Insight Data Science and many more.

Would love to hear your thoughts, see you on the platform.

Chris Hicken

@jonathanchavez This is exactly what the AI community needs right now! How do you plan to keep the benchmarks up-to-date as new models are released? It’d be great to see how you handle emerging models over time.

Jonathan Chávez

@chrishicken 

Thanks Chris!

We are currently building custom benchmarks that we will run on each model release.

We’re partnering with some companies to build new, difficult evals that are more expensive to run.

The challenge is to maintain all eval data fresh so consumers of these models can make better decisions.

Gabe Perez

I really like how the information is displayed and aggregated. Literally made me use Grok 4 Fast more across apps that I use and well... dang. Would be really cool to have a @Raycast extension to compare models or do quick ranking lookups! Congrats on the launch :)

Sebastian Crossa

@gabe As a @Raycast power user myself, this would be 100% useful. Will cook something up over the weekend and send it over to you to test 👀

Gabe Perez
Alex Cloudstar

Been drowning in model docs lately, pricing vs quality is a headache. Having cost per 1k tokens next to coding and long context scores on one site is useful. I care about fresh data, so I'll watch how often runs update. Will try the playground against my own prompts.

Jonathan Chávez

@alexcloudstar We’ll be adding better viz over time. I’m glad you’re trying it out.

Don’t hesitate to reach out if you find any bugs! We’ll fix them ASAP.

Luigi Pederzani

I'm an avid user of LLM Stats, and I think the chat feature is amazing!

Congrats on the launch @sebastiancrossa @jonathanchavez 👏

Sebastian Crossa

@jonathanchavez  @pederzh Thank you for the support, Luigi! You're the man.

Mahmoud Al-Madi

Let’s goo 🔥🔥🙌🙌🚀

Sebastian Crossa
Jonathan Chávez

@mahmoud_al_madi1  🚀🚀🚀

Abdul Rehman

Do you plan to include real-world tests like long conversations, tool use, or memory tasks? Those are becoming important lately.

Sebastian Crossa

@abod_rehman Yes! Apart from some of the existing benchmarks around tool use and memory, we're working on our own independent benchmarks which we're aiming to make it as transparent and reproducible as possible.

Coming soon!

Mahmoud Mabrouk

The LLM rankings are actually very nice! Very helpful! I hope you continue maintaining them.

Sebastian Crossa

@mabrouk We 100% will!

Is there anything we could do to improve it?

12
Next
Last