Devstral 2 is the new SOTA open-weight coding family, achieving 72.2% on SWE-bench Verified. It ships with Mistral Vibe, an open-source CLI agent for end-to-end code automation. Currently free via API.
Mistral just raised the bar for open-weight coding models. Devstral 2 (123B) hits 72.2% on SWE-bench Verified, effectively making it the new SOTA in the open-source space.
It rivals larger open models like DeepSeek V3.2 and gets surprisingly close to closed models like Claude Sonnet 4.5, but at a fraction of the inference cost. The smaller 24B version runs locally on consumer hardware but still punches above its weight.
They also released Mistral Vibe, a native CLI agent that handles end-to-end code automation right in your terminal.
Yeah what should I say. It's another Mistral release within 1 week and I've tested it yesterday. Must admit that it's really good and the Devstrall Small is punching above it's weight. Works smooth on my M4 chip and can't complain yet 👌🏻
Report
Congratulations... finally something from Europe. I currently use Claude Code. How does it fare in terms of data protection (model training)? Now it just needs to work as well as Claude Code.
Report
72.2% on SWE-bench is legit. Open-weight coding models being competitive with closed ones is huge for dev autonomy.
Q: How does the latency compare for real-time IDE integration? Also, is the VIbe CLI available now or just the model weights?
Shipping this matters! 🚀
Report
Would anyone know about a cursor like alternative that could use such models running locally ? I know about Cursor + running model locally and using nGrok but I am looking for something a bit more solid here.
Report
It’s wild that a 7B model is beating Llama 13B on reasoning benchmarks. The sliding window attention seems to be doing a lot of heavy lifting here. Curious if anyone has tried fine-tuning this for specific RAG tasks yet? I am wondering how fragile the reasoning gets once you saturate the context window
Report
Wow, Mistral AI looks amazing! The Devstral 2 SWE-bench score is incredible. How easily does Mistral Vibe integrate with existing CI/CD pipelines for automated testing?
Replies
Flowtica Scribe
Hi everyone!
Mistral just raised the bar for open-weight coding models. Devstral 2 (123B) hits 72.2% on SWE-bench Verified, effectively making it the new SOTA in the open-source space.
It rivals larger open models like DeepSeek V3.2 and gets surprisingly close to closed models like Claude Sonnet 4.5, but at a fraction of the inference cost. The smaller 24B version runs locally on consumer hardware but still punches above its weight.
They also released Mistral Vibe, a native CLI agent that handles end-to-end code automation right in your terminal.
The API is currently free to use!
Camocopy
Yeah what should I say. It's another Mistral release within 1 week and I've tested it yesterday. Must admit that it's really good and the Devstrall Small is punching above it's weight. Works smooth on my M4 chip and can't complain yet 👌🏻
Congratulations... finally something from Europe. I currently use Claude Code. How does it fare in terms of data protection (model training)? Now it just needs to work as well as Claude Code.
72.2% on SWE-bench is legit. Open-weight coding models being competitive with closed ones is huge for dev autonomy.
Q: How does the latency compare for real-time IDE integration? Also, is the VIbe CLI available now or just the model weights?
Shipping this matters! 🚀
Would anyone know about a cursor like alternative that could use such models running locally ? I know about Cursor + running model locally and using nGrok but I am looking for something a bit more solid here.
It’s wild that a 7B model is beating Llama 13B on reasoning benchmarks. The sliding window attention seems to be doing a lot of heavy lifting here. Curious if anyone has tried fine-tuning this for specific RAG tasks yet? I am wondering how fragile the reasoning gets once you saturate the context window
Wow, Mistral AI looks amazing! The Devstral 2 SWE-bench score is incredible. How easily does Mistral Vibe integrate with existing CI/CD pipelines for automated testing?