Launching today
The Multivac

The Multivac

Which LLM thinks best? Blind peer-judged leaderboard.

2 followers

Most LLM leaderboards are static, gameable, or judged by a single model. The Multivac runs a 10×10 blind peer matrix: every frontier model answers, then judges every other model's answer without knowing whose it is. What you get is a ranking of reasoning quality, not memorized benchmarks. Features: Ask Multivac (live multi-model answers + share pages), Model Pulse heatmap, head-to-head Compare, full data export, and an open-source evaluation engine (MIT).
The Multivac gallery image
The Multivac gallery image
The Multivac gallery image
The Multivac gallery image
The Multivac gallery image
The Multivac gallery image
The Multivac gallery image
Free
Launch Team / Built With