Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.

Launch tags:Software Engineering•Developer Tools•Artificial Intelligence

Launch Team / Built With

Wispr Flow: Dictation That Works EverywhereStop typing. Start speaking. 4x faster.

Promoted

Edgee

Maker

📌

Hey Product Hunt 👋

Sacha here, co founder of Edgee.

Story time. A few weeks ago I was working with Claude Code on a refactor with Opus. The model knew exactly what to do, but I sat there watching a 500-line file crawl out one token at a time. Two minutes for one file. Multiply that by every step of the agent loop and you realize: speed is the silent tax on every coding session.

Around the same time I started testing open-source models like GLM and Kimi K2.7. The quality on coding tasks was honestly impressive.

But the speed on standard endpoints was even slower than the closed models. And the setup was painful: API keys, code changes, CLAUDE md to rewrite, MCP servers to reconfigure.

That's the problem we built Edgee Turbo Models to solve.

What it does:

→ Run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly in Claude Code.

→ At up to 4x the speed of standard endpoints (~200 tok/s vs ~50).

→ Flat $29/month. No metered token bill that climbs as your agents work harder.

→ Setup in 2 minutes. Your CLAUDE md, MCP servers, and entire setup stay exactly where they are.

Important point I want to get out front because it'll come up:

Turbo is NOT a smaller or quantized version of these models. They are the full open-weight checkpoints. Turbo only changes how they are served, on dedicated high-throughput inference infrastructure built for raw speed, not a shared best-effort endpoint. Same outputs, just faster.

How this fits with our previous launches:

- Compression: use fewer tokens per request

- Teams: see who uses what, per repo, per PR

- Fallback Models: keep working when Claude or Copilot hit limits

- Turbo Models: run open-source models at premium speed, for flat pricing

Together that is the Route + Compress + Observe stack of our Agent Gateway. Today we're shipping the speed layer.

Why now: The Economist published a piece this week confirming that "token-maxxing is over" and that companies are routing to cheaper models. Open-source models are clearly part of the answer. Turbo

makes them actually usable.

A few questions I'd love your feedback on:

→ Which open-source coding model are you most curious to try?

→ Is flat $29/month the right price point, or would you prefer usage-based?

→ What other models should we add to the Turbo lineup?

Will be in comments all day. Thanks for checking it out 🙏

Report

20h ago

Tabstack by Mozilla

Hunter

@sachamorard @Product Hunt is about consistency, S/O for this new launch! keep up the great, and keep launching 👏👏

Report

3h ago

Being able to run different models through Claude Code is really cool. Can you switch between models mid-session, or is it set per project?

Report

34m ago

Edgee

Maker

@doganakbulut You can switch whenever you want, please be aware that this will result in a cache miss at providers level therefore we recommend waiting for the next session if you want to switch !

Report

22m ago

Proxying Claude Code's API calls through a gateway to route to Kimi K2.7 or MiniMax without code changes is clean architecture. We've hit throughput ceilings in agentic workflows where task latency compounds fast, so the 4x speed claim is interesting. Does Edgee handle automatic fallback if a model hits rate limits mid-session?

Report

2h ago

Edgee

Maker

@anand_thakkar1 Yes, exactly! Edgee is able to fall back to the model/provider you choose.
Even without the turbo models, you can use it with Claude and fall back to Kimi when your usage limit is reached (or when Anthropic has an incident), for example.

Report

1h ago

flat $29/month instead of usage-based is the right call for anyone running agents that loop unpredictably. the worst part of token-based pricing is never knowing what the bill will be until it's too late. also being able to swap in open-source models without changing any code or rewriting configs removes the biggest barrier to actually trying them. most people stick with what they know because switching is painful, not because alternatives aren't good enough

Report

1h ago

Edgee

Maker

@tina_chhabra Both points are exactly what we kept hearing in customer conversations before we built this.

On flat pricing: agentic workflows are genuinely unpredictable. One day you have a clean refactor, the next your agent is in a 30-minute edit-run-fix loop and you've burned $40. Knowing your monthly ceiling upfront is a different kind of freedom. It changes how teams use the tool because the meter isn't running in their heads.

On switching cost: this is the part most people underestimate. The quality gap between closed and open frontier models on coding tasks is far smaller than the gap in setup friction. We benchmarked GLM 5.1 and Kimi K2.6 Code against Sonnet on real coding sessions for weeks before this launch, and the outputs are genuinely close. But "close on output" doesn't matter if "setup takes a Saturday." So that's the actual battle: zero-config switching.

The whole point of routing through a gateway is that "trying a new model" becomes a 30-second toggle, not a project.

Thanks for the thoughtful comment 🙏

Report

1h ago

Love the flat rate approach for unpredictable agent loops, excited to test Kimi K2.7 Code with this kind of speed. Huge congrats on shipping this, @sachamorard

Report

6h ago

Edgee

Maker

@priya_kushwaha1 You'll see, Kimi K2.7 Code (turbo version) is really impressive. Looking forward to having your feedback

Report

6h ago

Tabstack by Mozilla

Hunter

I can relate. Kimi's AI models are solid options for coding tasks. See this comparison with Opus 4.7. TL,DR: Great for prototyping or exploring a design. Opus remains ahead for work requiring correctness and accuracy.

Also fun fact: Composer 2.5, Cursor's most recent coding model, actually is based on Kimi's.

Report

3h ago

Previous Edgee Launches

Edgee Fallback ModelsClaude Code that never stops

Launched on May 24th, 2026

Edgee TeamStrava for your coding assistants

Launched on April 26th, 2026

Edgee Codex CompressorUse Codex at 35.6% lower costs

Launched on April 12th, 2026

Edgee Claude Code CompressorExtend Claude Pro's limit by 26.2%

Launched on March 22nd, 2026

View all Edgee launches

Forum Threads

p/edgee

•

4mo ago

Token Compression for LLMs: How to reduce context size without losing accuracy

Hey, I'm Sacha, co-founder at @Edgee

Over the last few months, we've been working on a problem we kept seeing in production AI systems:

LLM costs don't scale linearly with usage, they scale with context.
As teams add RAG, tool calls, long chat histories, memory, and guardrails, prompts become huge and token spend quickly becomes the main bottleneck.

So we built a token compression layer designed to run before inference.

View all

Hey Product Hunt 👋

Sacha here, co founder of Edgee.

Around the same time I started testing open-source models like GLM and Kimi K2.7. The quality on coding tasks was honestly impressive.

But the speed on standard endpoints was even slower than the closed models. And the setup was painful: API keys, code changes, CLAUDE md to rewrite, MCP servers to reconfigure.

That's the problem we built Edgee Turbo Models to solve.

What it does:

→ Run frontier open-source models (GLM 5.1, Kimi K2.7 Code, Kimi K2.6, MiniMax 2.7) directly in Claude Code.

→ At up to 4x the speed of standard endpoints (~200 tok/s vs ~50).

→ Flat $29/month. No metered token bill that climbs as your agents work harder.

→ Setup in 2 minutes. Your CLAUDE md, MCP servers, and entire setup stay exactly where they are.

Important point I want to get out front because it'll come up:

How this fits with our previous launches:

- Compression: use fewer tokens per request

- Teams: see who uses what, per repo, per PR

- Fallback Models: keep working when Claude or Copilot hit limits

- Turbo Models: run open-source models at premium speed, for flat pricing

Together that is the Route + Compress + Observe stack of our Agent Gateway. Today we're shipping the speed layer.

Why now: The Economist published a piece this week confirming that "token-maxxing is over" and that companies are routing to cheaper models. Open-source models are clearly part of the answer. Turbo

makes them actually usable.

A few questions I'd love your feedback on:

→ Which open-source coding model are you most curious to try?

→ Is flat $29/month the right price point, or would you prefer usage-based?

→ What other models should we add to the Turbo lineup?

Will be in comments all day. Thanks for checking it out 🙏

Edgee

The Agent Gateway that TL;DR tokens

The Agent Gateway that TL;DR tokens

Edgee Turbo Models

Previous Edgee Launches

Forum Threads

Token Compression for LLMs: How to reduce context size without losing accuracy

Previous Edgee Launches

Forum Threads

Token Compression for LLMs: How to reduce context size without losing accuracy

What's great

What needs improvement

What's great

What needs improvement

vs Alternatives

What's great

What needs improvement

What's great

What needs improvement

vs Alternatives