Your AI assistant for everyday use

Start new thread

Kimi K2.6 - Open-source SOTA for long-horizon coding and agent swarms

Flowtica Scribe

•3mo ago

Kimi K2.6 is Moonshot’s latest open-source model, built to push coding, long-horizon execution, and agent swarms forward at the same time. It brings stronger end-to-end coding, 300-agent swarm orchestration, and improved reliability for always-on agent frameworks like OpenClaw and Hermes.

Replies

Best

Kimi AI - Now with K2.6

Maker

Hey PH 👋

Kimi K2.6 is our latest open-source model, built for long-horizon coding and agents - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization).

Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)

Live at kimi.com, the app, API, and Kimi Code. Would love your feedback :)

Report

3mo ago

@crystal_j For a non-coder like me scripting PH launch trackers, how does Kimi K2.6 handle multi-step tool chains with error recovery? Like if an API flakes or a prompt needs human tweak mid-flow?

Report

3mo ago

Flowtica Scribe

Hunter

I’ve been on K2.6-code-preview for a while, and now it’s officially K2.6. It has been kind of wild!

The model really shines on long-horizon coding: thousands of tool calls across hours of continuous execution, strong generalization across languages and tasks, plus the ability to generate rich, animated frontends with real motion and 3D elements. The agent swarm upgrades (300 parallel sub-agents) and proactive 24/7 agent support also feel like a meaningful step up.

As always, Kimi keeps delivering frontier-level models as open source. Respect🫡🫡

Report

3mo ago

DiffSense

@zaczuo Whats long horizon coding? What do you use it for? 1000 of calls? I do max 100 calls on PR. How well does it compare to opus 4.7? I heard the previous Kimi was almost as good as opus 4.6

Report

3mo ago

Flowtica Scribe

Hunter

@conduit_design Me & team mainly use it for heavy debugging in our recent Android sprint. The level of deep bugs it surfaced was not weaker than 5.4.

Report

3mo ago

DiffSense

@zaczuo Ahh thats really smart. Just use it as a Smart UI-test / Unit tester.

Report

3mo ago

Kilo Code

K2.6 offers SOTA-level performance at a fraction of the cost.

It's open-weights, it's fast, and optimized for long-context tasks across the codebase, as well as the day-to-day work needed to support an always-on agent like @OpenClaw and @KiloClaw.

Impressive.

Report

3mo ago

Brila

How strict is Kimi with sensitive topics? How would you rate it against the big three US models on filter sensitivity toward information security, copyright, interpersonal boundaries, etc.?

I'm not talking about explicitly dangerous activity, but about legitimate tasks that that trigger the filters occasionally. An example is Claude Code refusing to configure the Microsoft Entra dashboard because it looks like a hacker attack to it.

Report

3mo ago

300-agent swarm orchestration is wild — curious how reliable the long-horizon execution actually is in practice. Anyone tried it on multi-hour coding sessions yet?

Report

3mo ago

how does Kimi code allegretto and moderato compares to Claude or Gemini quota? I have both Pro subscriptions and I get through the week consuming both quotas.

Report

3mo ago

Solid open-weights drop. How does K2.6 compare to Claude Sonnet on multi-file refactors where you need to hold the call graph across 30+ files? SWE-bench score looks great but curious about real-world agent loops where context drift kills smaller models.

Report

3mo ago

The 300 parallel sub-agents thing is wild. Most coding agents I've used top out at like 5-10 concurrent tool calls before they start stepping on each other. If Kimi K2.6 can actually coordinate 300 without losing coherence, that's a genuine architectural advantage not just a benchmark flex. How does it handle conflicting edits when multiple agents touch the same file?

Report

3mo ago

Humalike

300-agent swarm orchestration as a default capability is the bet I want to see real numbers on. Curious about the failure mode at scale: when one of the 300 sub-agents goes off-track or hallucinates a tool call, does K2.6 surface that to the orchestrator early, or does it propagate quietly through the swarm? The recovery semantics matter more than peak SWE-bench at this fan-out.

Report

3mo ago