Edgee compresses tokens before they reach LLM providers, reducing the token cost by up to 50%. Same code, fewer tokens, lower bills.
This is the 6th launch from Edgee. View more
Edgee Turbo Models
Launched this week
Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.




Launch Team / Built With






Congrats on the launch, Sacha! Curious about how does Edgee handle consistency during peak demand when multiple teams are hitting the same Turbo endpoints simultaneously?
Edgee
@crystalmei Thanks 🙏
Three layers handle this:
1/ Gateway: requests don't queue locally. Per-provider HTTP/2 connection pools with multiplexing, horizontal scaling, no shared serialization point. A burst from one team doesn't slow down another.
2/ Inference: Turbo runs on dedicated high-throughput infra with our partners ( @Together AI , @Fireworks - Fastest Inference for Generative AI , others depending on model). They auto-scale and load-balance across replicas. We've sized partnerships with headroom for peak hours.
3/ Fallback: if a Turbo lane is ever saturated, Edgee routes automatically to a standard endpoint of the same model family. The agent loop never breaks, the user might just see a slightly slower response for that call.
Plus per-key rate limiting at the team level so one heavy user doesn't degrade the experience for the others.
🙏
So I can create apps with this new Edge Turbo Model?? An also a standard fee 29.99 ? No more token manipulation?
Edgee
@chaseforbis98 thanks for your questions, let me clarify a few things:
Turbo Models is a feature inside Edgee, not a single model. It lets you run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly inside Claude Code or Codex. So you're not building apps "with Turbo", you're using Turbo to power your coding assistant when you build whatever apps you want.
Yes, flat $29/month per developer. Not $29.99. The plan includes a generous monthly usage allowance that covers full-time intensive coding for the vast majority of developers. There is a ceiling at the very high end (for context, you'd need to be running agentic loops nearly continuously to hit it), and if you ever get close, we'll talk before anything changes. Transparent and fair.
On "no more token manipulation": you're right that you stop worrying about the per-token meter, which is the big mental shift. But Edgee actually does smart token compression behind the scenes (cutting what gets sent to the model by ~50% on coding sessions), so you get the benefit of token optimization without having to think about it. Set it and forget it.
Hope this clarifies. Happy to go deeper on anything 🙏
flat $29/month instead of usage-based is the right call for anyone running agents that loop unpredictably. the worst part of token-based pricing is never knowing what the bill will be until it's too late. also being able to swap in open-source models without changing any code or rewriting configs removes the biggest barrier to actually trying them. most people stick with what they know because switching is painful, not because alternatives aren't good enough
Edgee
@tina_chhabra Both points are exactly what we kept hearing in customer conversations before we built this.
On flat pricing: agentic workflows are genuinely unpredictable. One day you have a clean refactor, the next your agent is in a 30-minute edit-run-fix loop and you've burned $40. Knowing your monthly ceiling upfront is a different kind of freedom. It changes how teams use the tool because the meter isn't running in their heads.
On switching cost: this is the part most people underestimate. The quality gap between closed and open frontier models on coding tasks is far smaller than the gap in setup friction. We benchmarked GLM 5.1 and Kimi K2.6 Code against Sonnet on real coding sessions for weeks before this launch, and the outputs are genuinely close. But "close on output" doesn't matter if "setup takes a Saturday." So that's the actual battle: zero-config switching.
The whole point of routing through a gateway is that "trying a new model" becomes a 30-second toggle, not a project.
Thanks for the thoughtful comment 🙏
This is exactly the kind of model switching devs pretend they do not need and then use 12 times a day. Love the practical angle.
Edgee
Haha, thanks@sarveshsea , accurate. I'm the founder and I still catch myself defending
my "main" model out of pure habit, then switching three times in the same session.
The honest truth is most devs don't need one perfect model, they need the right one for each task. Turbo just makes that switching free.
Thanks 🙏
Using Edgee with Kimi K2.7, huge savings comparing to OpenRouter. Keep going guy!
Edgee
Haha, Thank you @denis_lt , that means a lot.
Kimi K2.7 Code on Turbo is genuinely a sweet spot right now. The combination of the model's coding capability and the speed/price of how it's served changes the calculus for a lot of agentic workflows.
Keep building 🙏
I am using Edgee every day and it is saving me tokens and making my claude usage more efficent
Edgee
@olivier_thirion_de_briel Thanks for being one of our core users, it means a lot to us. And yeah, your savings are incredible. Can't wait for your to test our next compressor ;)
Love the flat rate approach for unpredictable agent loops, excited to test Kimi K2.7 Code with this kind of speed. Huge congrats on shipping this, @sachamorard
Edgee
@priya_kushwaha1 You'll see, Kimi K2.7 Code (turbo version) is really impressive. Looking forward to having your feedback
Tabstack by Mozilla
I can relate. Kimi's AI models are solid options for coding tasks. See this comparison with Opus 4.7. TL,DR: Great for prototyping or exploring a design. Opus remains ahead for work requiring correctness and accuracy.
Also fun fact: Composer 2.5, Cursor's most recent coding model, actually is based on Kimi's.