
Mercury
The First Commercial-Scale Diffusion LLM
177 followers
The First Commercial-Scale Diffusion LLM
177 followers
Mercury,from Inception Labs, is the first commercial diffusion LLM. Up to 10x faster than autoregressive models, with comparable or better quality on coding tasks.
This is the 3rd launch from Mercury. View more

Mercury Edit 2
Launching today
Mercury Edit 2 is a coding-focused diffusion LLM built specifically for next-edit prediction. It uses your recent edits and codebase context to suggest the next change, with much higher acceptance and much lower latency than typical code-edit models.


Free Options
Launch Team



Flowtica Scribe
Hi everyone!
Mercury Edit 2 is not a general chat model for coding. It is purpose-built for next-edit prediction, one of the most latency-sensitive parts of dev workflows.
The interesting part is that it is built on a diffusion architecture, so it generates tokens in parallel instead of one by one, which is exactly why it can feel so fast. Inception is claiming 75.6% quality at 221ms, plus a 48% higher accept rate and 27% fewer shown edits than the previous version.
If you use @Zed , there is a specific API key that unlocks a free 1-month trial.
You can find the configuration tutorial here.
@zaczuo The parallel token generation part is really interesting — does that noticeably change the “feel” compared to traditional models?
Cue
Congrats on the launch! A diffusion LLM purpose-built for next-edit prediction is a really interesting angle, and the latency advantage over autoregressive models seems like it could be huge for IDE integrations. Are you seeing the biggest gains in specific languages or is it pretty consistent across the board?
Diffusion architecture for code prediction is a bold bet. Most completion tools just throw a bigger autoregressive model at the problem and call it a day. Curious about the latency in practice though. 221ms on paper vs 221ms when you're mid-flow writing Flutter code are very different things. Does it handle Dart well or is it mostly tuned for the usual Python/JS suspects?
The 'next-edit' framing is interesting - it's predicting intent rather than continuation. How does it handle non-local edits? Like, you rename a function and it needs to chase all the call sites. Is that in scope or is this more single-cursor stuff?
Features.Vote
is 'next-edit prediction' meaningfully different from standard autocomplete, or is that just a frame for faster completions?
the diffusion architecture is where this gets interesting. autoregressive models generate one token at a time, so by the time you've generated a complete suggestion for one location, it's too slow to extend across several. diffusion generates token positions in parallel, which makes 'what else changes after this edit' tractable at 221ms. the 48% higher accept rate is the number that actually matters here. low accept rates train developers to dismiss suggestions without reading them. if mercury edit 2 is genuinely better at predicting which edits to surface next, that changes the daily feel more than raw latency numbers do.