The Multivac

Which LLM thinks best? Blind peer-judged leaderboard.

4 followers

Which LLM thinks best? Blind peer-judged leaderboard.

4 followers

Most LLM leaderboards are static, gameable, or judged by a single model. The Multivac runs a 10×10 blind peer matrix: every frontier model answers, then judges every other model's answer without knowing whose it is. What you get is a ranking of reasoning quality, not memorized benchmarks. Features: Ask Multivac (live multi-model answers + share pages), Model Pulse heatmap, head-to-head Compare, full data export, and an open-source evaluation engine (MIT).

Overview
Reviews
Team
More

The Multivac makers

Here are the founders, developers, designers and product people who worked on The Multivac

Yash Darji Building Aligned AI

The Multivac