Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team / Built With

Congrats on the launch, Sacha! Curious about how does Edgee handle consistency during peak demand when multiple teams are hitting the same Turbo endpoints simultaneously?

Report

1d ago

Edgee

Maker

@crystalmei Thanks 🙏

Three layers handle this:

1/ Gateway: requests don't queue locally. Per-provider HTTP/2 connection pools with multiplexing, horizontal scaling, no shared serialization point. A burst from one team doesn't slow down another.

2/ Inference: Turbo runs on dedicated high-throughput infra with our partners ( @Together AI , @Fireworks - Fastest Inference for Generative AI , others depending on model). They auto-scale and load-balance across replicas. We've sized partnerships with headroom for peak hours.

3/ Fallback: if a Turbo lane is ever saturated, Edgee routes automatically to a standard endpoint of the same model family. The agent loop never breaks, the user might just see a slightly slower response for that call.

Plus per-key rate limiting at the team level so one heavy user doesn't degrade the experience for the others.

🙏

Report

24h ago

So I can create apps with this new Edge Turbo Model?? An also a standard fee 29.99 ? No more token manipulation?

Report

1d ago

Edgee

Maker

@chaseforbis98 thanks for your questions, let me clarify a few things:

Turbo Models is a feature inside Edgee, not a single model. It lets you run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly inside Claude Code or Codex. So you're not building apps "with Turbo", you're using Turbo to power your coding assistant when you build whatever apps you want.

Yes, flat $29/month per developer. Not $29.99. The plan includes a generous monthly usage allowance that covers full-time intensive coding for the vast majority of developers. There is a ceiling at the very high end (for context, you'd need to be running agentic loops nearly continuously to hit it), and if you ever get close, we'll talk before anything changes. Transparent and fair.

On "no more token manipulation": you're right that you stop worrying about the per-token meter, which is the big mental shift. But Edgee actually does smart token compression behind the scenes (cutting what gets sent to the model by ~50% on coding sessions), so you get the benefit of token optimization without having to think about it. Set it and forget it.

Hope this clarifies. Happy to go deeper on anything 🙏

Report

1d ago

flat $29/month instead of usage-based is the right call for anyone running agents that loop unpredictably. the worst part of token-based pricing is never knowing what the bill will be until it's too late. also being able to swap in open-source models without changing any code or rewriting configs removes the biggest barrier to actually trying them. most people stick with what they know because switching is painful, not because alternatives aren't good enough

Report

1d ago

Edgee

Maker

@tina_chhabra Both points are exactly what we kept hearing in customer conversations before we built this.

On flat pricing: agentic workflows are genuinely unpredictable. One day you have a clean refactor, the next your agent is in a 30-minute edit-run-fix loop and you've burned $40. Knowing your monthly ceiling upfront is a different kind of freedom. It changes how teams use the tool because the meter isn't running in their heads.

On switching cost: this is the part most people underestimate. The quality gap between closed and open frontier models on coding tasks is far smaller than the gap in setup friction. We benchmarked GLM 5.1 and Kimi K2.6 Code against Sonnet on real coding sessions for weeks before this launch, and the outputs are genuinely close. But "close on output" doesn't matter if "setup takes a Saturday." So that's the actual battle: zero-config switching.

The whole point of routing through a gateway is that "trying a new model" becomes a 30-second toggle, not a project.

Thanks for the thoughtful comment 🙏

Report

1d ago

This is exactly the kind of model switching devs pretend they do not need and then use 12 times a day. Love the practical angle.

Report

1d ago

Edgee

Maker

Haha, thanks@sarveshsea , accurate. I'm the founder and I still catch myself defending

my "main" model out of pure habit, then switching three times in the same session.

The honest truth is most devs don't need one perfect model, they need the right one for each task. Turbo just makes that switching free.

Thanks 🙏

Report

1d ago

Using Edgee with Kimi K2.7, huge savings comparing to OpenRouter. Keep going guy!

Report

1d ago

Edgee

Maker

Haha, Thank you @denis_lt , that means a lot.

Kimi K2.7 Code on Turbo is genuinely a sweet spot right now. The combination of the model's coding capability and the speed/price of how it's served changes the calculus for a lot of agentic workflows.

Keep building 🙏

Report

1d ago

I am using Edgee every day and it is saving me tokens and making my claude usage more efficent

Report

20h ago

Edgee

Maker

@olivier_thirion_de_briel Thanks for being one of our core users, it means a lot to us. And yeah, your savings are incredible. Can't wait for your to test our next compressor ;)

Report

20h ago

Love the flat rate approach for unpredictable agent loops, excited to test Kimi K2.7 Code with this kind of speed. Huge congrats on shipping this, @sachamorard

Report

2d ago

Edgee

Maker

@priya_kushwaha1 You'll see, Kimi K2.7 Code (turbo version) is really impressive. Looking forward to having your feedback

Report

2d ago

Tabstack by Mozilla

Hunter

I can relate. Kimi's AI models are solid options for coding tasks. See this comparison with Opus 4.7. TL,DR: Great for prototyping or exploring a design. Opus remains ahead for work requiring correctness and accuracy.

Also fun fact: Composer 2.5, Cursor's most recent coding model, actually is based on Kimi's.

Report

1d ago

1 2 3

Previous Edgee Launches

Edgee Fallback ModelsClaude Code that never stops

Launched on May 24th, 2026

Edgee TeamStrava for your coding assistants

Launched on April 26th, 2026

Edgee Codex CompressorUse Codex at 35.6% lower costs

Launched on April 12th, 2026

Edgee Claude Code CompressorExtend Claude Pro's limit by 26.2%

Launched on March 22nd, 2026

View all Edgee launches

Forum Threads

p/edgee

•

4mo ago

Token Compression for LLMs: How to reduce context size without losing accuracy

Hey, I'm Sacha, co-founder at @Edgee

Over the last few months, we've been working on a problem we kept seeing in production AI systems:

LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.

So we built a token compression layer designed to run before inference.

View all

@chaseforbis98 thanks for your questions, let me clarify a few things:

Hope this clarifies. Happy to go deeper on anything 🙏