👋 Hey Product Hunt! We're excited to launch Agent Mode on Arena.

AI chat experiences are often limited to rigid, single-modality interactions that require switching tools or

additional prompting. Agent Mode changes that. You can now prompt once and the agent will plan, browse,

research, and code in a sandbox testing environment to complete real-world, multi-step tasks for you.

Every Agent Mode session also powers our new Agent Leaderboard, built entirely from behavioral signals (such as confirmed success, bash recovery, steerability, and more) collected from real users running real-world workflows. We’re excited to have our community contributing to the leaderboard, and provide a new standard for measuring AI advancement.

We'd love your feedback: What agentic tasks did you throw at it? What tools should we add next? Thanks for checking it out 🙏

👋 Hey Product Hunt! We're excited to launch Agent Mode on Arena.

AI chat experiences are often limited to rigid, single-modality interactions that require switching tools or

additional prompting. Agent Mode changes that. You can now prompt once and the agent will plan, browse,

research, and code in a sandbox testing environment to complete real-world, multi-step tasks for you.

We'd love your feedback: What agentic tasks did you throw at it? What tools should we add next? Thanks for checking it out 🙏

Arena

Benchmark and compare the best AI models

Benchmark and compare the best AI models

Arena Agent Mode