
Arena
Benchmark and compare the best AI models
406 followers
Benchmark and compare the best AI models
406 followers
Arena is an open platform to evaluate, benchmark, compare, and test frontier AI models.
This is the 2nd launch from Arena. View more
Arena Agent Mode
Most AI benchmarks test models in controlled environments. Agent Mode tests them on complex tasks to get more work done. Run autonomous agents that browse, research, code, use files, and complete multi-step workflows from a single prompt. Then watch each workflow unfold step by step. Every run contributes to the Agent Arena Leaderboard, ranking frontier models by real-world agentic performance.









Free
Launch Team
๐ Hey Product Hunt! We're excited to launch Agent Mode on Arena.
AI chat experiences are often limited to rigid, single-modality interactions that require switching tools or
additional prompting. Agent Mode changes that. You can now prompt once and the agent will plan, browse,
research, and code in a sandbox testing environment to complete real-world, multi-step tasks for you.
Every Agent Mode session also powers our new Agent Leaderboard, built entirely from behavioral signals (such as confirmed success, bash recovery, steerability, and more) collected from real users running real-world workflows. Weโre excited to have our community contributing to the leaderboard, and provide a new standard for measuring AI advancement.
We'd love your feedback: What agentic tasks did you throw at it? What tools should we add next? Thanks for checking it out ๐