
Arena
Benchmark and compare the best AI models
456 followers
Benchmark and compare the best AI models
456 followers
Arena is an open platform to evaluate, benchmark, compare, and test frontier AI models.
This is the 2nd launch from Arena. View more
Arena Agent Mode
Launching today
Most AI benchmarks test models in controlled environments. Agent Mode tests them on complex tasks to get more work done. Run autonomous agents that browse, research, code, use files, and complete multi-step workflows from a single prompt. Then watch each workflow unfold step by step. Every run contributes to the Agent Arena Leaderboard, ranking frontier models by real-world agentic performance.









Free
Launch Team

Arena
π Hey Product Hunt! We're excited to launch Agent Mode on Arena.
AI chat experiences are often limited to rigid, single-modality interactions that require switching tools or
additional prompting. Agent Mode changes that. You can now prompt once and the agent will plan, browse,
research, and code in a sandbox testing environment to complete real-world, multi-step tasks for you.
Every Agent Mode session also powers our new Agent Leaderboard, built entirely from behavioral signals (such as confirmed success, bash recovery, steerability, and more) collected from real users running real-world workflows. Weβre excited to have our community contributing to the leaderboard, and provide a new standard for measuring AI advancement.
We'd love your feedback: What agentic tasks did you throw at it? What tools should we add next? Thanks for checking it out π
Earth.fm
One thing I appreciate about Arena is that it shifts the conversation from "which model is trending" to "which model actually performs best for my use case." With the pace of AI innovation today, having a reliable way to evaluate and compare models is incredibly valuable. This feels like a product that can help builders make smarter decisions instead of relying on assumptions or marketing.
Congratulations on the launch β excited to see how the platform evolves and serves the AI community! π
CheckYa
Arena feels like a much-needed reality check for the AI space. Instead of guessing or trusting scattered benchmarks, it brings everything into one place where models can be evaluated side by side in a practical way. For anyone building with AI, this kind of clarity is extremely valuable. Excited to see how it grows and how the community contributes to making AI evaluation more transparent and useful over time.
Uselink
the UI alone is pretty awesome. you guy really have that "taste", Elliott
I just tested it, and it's mind-blowing