Kimi K2.7 Code - Kimi’s most capable coding model yet
Kimi K2.7 Code is Moonshot AI’s latest coding-focused agentic model, built for long-horizon software engineering, 256K context, multi-step tool use, multimodal inputs, and around 30% lower reasoning-token usage than K2.6. Available in Kimi Code, Kimi API, and as open weights/code.


Replies
Flowtica Scribe
Hi everyone!
Kimi K2.7 Code is open-weights and focuses on improving real-world long-horizon coding performance. Compared with K2.6, it shows clear gains in instruction following over long contexts and higher success rates on multi-step coding tasks.
It also reduces overthinking quite a bit, with 30% lower reasoning-token usage. The model runs with thinking mode on by default and has better support for vision + tool calling in agent workflows.
Kimi Code has already upgraded its default model to K2.7 Code, and a 6x faster high-speed version is coming!
Interesting model. The 30% lower reasoning-token count is notable. Does that also reduce latency proportionally for typical multi-step tasks?
The 30% drop in reasoning tokens alongside better multi-step task success is the interesting signal here. It suggests you're pruning unproductive reasoning chains rather than just thinking less. We've seen agent costs spiral on complex multi-turn tasks because of runaway chain-of-thought. How did you train the model to distinguish productive reasoning steps from redundant ones?
Interesting launch. For coding-focused models, the thing I’d want to test is not just generation quality, but how well it handles long-running repo work: keeping context clean, explaining risky changes, and recovering after failed tests.
Humalike
Congrats on the launch! How do you plan to handle all the user load you'll receive after launch?
Local Panel
To be honest, I really like Kimi, but this time the benchmarks are a bit below my expectations; they only seem to be slightly better than 2.6. But I really appreciate the fact that you’re open-source and constantly striving to improve. Thanks, team.
The open-weights + 256K context combination is what I'd test first, especially on a repo task where the model has to keep tool outputs, diffs, and failed test logs straight. Lower reasoning-token usage is useful, but the tradeoff I wonder about is recovery after the agent makes a bad edit. Do you have evals that measure whether K2.7 can backtrack from a failed test run without losing the original instruction?
Mailwarm
Congrats on today's launch!!
A friend recommended Kimi to me a few months ago and I've had a pretty good experience so far. The open-weight approach is also a big plus. Excited to see where K2.7 goes from here.