Atlas: Independent Evals and Benchmarks for GenAI models

Atlas, by LayerLens, is a community resource intended to provide insights about the performance of the top AI models through evals on benchmarks such as MATH, HumanEval, and MMLU. We are data-first, and provide a full suite of analytics for our benchmarks.

Framer — Launch websites with enterprise needs at startup speeds.

Launch websites with enterprise needs at startup speeds.

We are a team of developers, engineers, and data scientists who constantly found ourselves asking "What is the best AI model for X?". As it turns out, this was not a straightforward question. Most benchmarks for frontier AI models came from the model creators themselves, or were reliant on crowdsourced "arena" style leaderboards which often felt subjective. Objective benchmarks did exist, but there was no easy way to get independent results for them without setting up the pipelines yourself. We built Atlas by leveraging analytics as a base principle. It is our belief that generative AI should be held to the same standards as traditional software. Atlas currently has the largest, most extensive suite of benchmarks (over 50) out of any public leaderboard, and provides traces for individual prompts, which is something that no other leaderboard does.

A clean and credible view of how AI models actually perform. Congrats on the launch. We just launched Mukh.1 too — AI agents that take care of the everyday stuff. Give it a look!

Framer — Launch websites with enterprise needs at startup speeds.

Launch websites with enterprise needs at startup speeds.

A clean and credible view of how AI models actually perform. Congrats on the launch. We just launched Mukh.1 too — AI agents that take care of the everyday stuff. Give it a look!