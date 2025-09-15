Launching today
Coding LLMs go head-to-head on real programming tasks. Developers vote on which solution they'd actually ship. These votes become training data for better models. No synthetic tests. Just code, performance, and brutal honesty.
Hi Product Hunt community 👋🏻 I'm Rafik from HackerRank, and we're excited to introduce Model Kombat, our live coding arena where LLMs fight for developer approval on real programming tasks. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗶𝘁? Model Kombat is a public evaluation arena where coding LLMs go head-to-head, generating solutions live. Developers vote on which code they'd actually ship to production. These votes become Direct Preference Optimization (DPO) training data, creating a continuous feedback loop that makes coding LLMs better for everyone. 𝗪𝗵𝘆 𝗻𝗼𝘄? Current LLM benchmarks are fundamentally broken. They rely on synthetic tests and crowd-labeled data from non-experts while companies bet millions on models that might fail at basic production tasks. Model Kombat solves this by putting real developers in charge. No more "trust me, bro, my model is best." Prove it in the arena, or lose. 𝗪𝗵𝗮𝘁'𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱? Live Model Battles: Two models generate solutions side by side, with problem statements always visible. You vote for the code that would pass your actual code review. Language Specific Leaderboards: Track which models dominate Python vs SQL vs JavaScript. Understand model strengths and weaknesses with precision. DPO Eval Pipeline: Every vote captures programming language, task type, difficulty level, model patterns, and developer comments. This rich metadata makes future models understand what production-ready code actually looks like. Full Transparency: All evaluation data is public. Leaderboards by language, task type, and difficulty level. No hidden benchmarks or cherry-picked results. For Developers, By Developers: Built as both a fun game for devs and a serious evaluation platform for model builders. Run controlled evaluations, test fine-tuned variants, and see how your model stacks up publicly. Welcome to the arena. The fight starts now! 💪🏼 Would love to hear what you think or which models you'd like to see battle next!
