One gateway for cheaper, faster, unstoppable coding agents

Start new thread

Edgee Fallback Models - Claude Code that never stops

Kilo Code

•2mo ago

Your Claude Code session shouldn't die when Anthropic goes down or your plan runs out. Edgee Fallback Models keeps coding assistants running by routing to alternative models like Kimi K2.6, Gemma, GLM, or Qwen when Claude is unavailable, rate-limited, or just too expensive. Or one-click fallback to your own Bedrock, Vertex, or Azure account. Same Claude Code, different backend, zero code changes. Built for teams that can't afford to stop shipping.

Replies

Best

Edgee

Maker

📌

Hey friends

Sacha here, founder of @Edgee .

Two weeks ago Anthropic announced that starting June 15, your programmatic Claude usage gets capped at a $20-$200 monthly credit pool. For heavy Claude Code users, that's roughly a 25 to 40x cut in effective inference.
Same with Copilot that is moving to usage-based pricing June 1st.

A lot of people are angry about it. I get it. But we're builders, and the right answer to a market change is to ship better tools, not to complain.

We started building Fallback Models the week before Anthropic's announcement, after one too many Anthropic outages. The timing is now coincidentally perfect.

Here's what our Fallback Models feature does:

→ Anthropic down? Route to Kimi K2.6, GLM, Qwen, Gemma, or others.

→ Plan limit hit? Same thing, automatically.

→ Want to route always? Pick your model.

You can also fall back to your own Bedrock, Vertex, or Azure account in one click. Same Claude Code on top, your cloud underneath, zero code changes.

And it works the same with Copilot, Codex...

How it fits with our other features:

- Compression: use fewer tokens

- Teams: see who uses tokens and on what

- Fallback Models: keep working when your primary model can't

Fallback Models ships with our Team plan. The compression engine that powers all of it is free to try, no credit card.

Two questions for you:

- Which fallback models would you actually want to use?

- What other failure modes should your coding assistant handle?

Will be in comments all day 🙏

edgee.ai/fallback-models

Report

2mo ago

Kilo Code

Hunter

@sachamorard bravo for this new launch - keep up the great work, keep launching

Report

2mo ago

@fmerian Love the "compress" part. Most fallback tools just switch models and you lose half the context. Do you recompress the conversation history before sending to Kimi/Qwen, or do you keep the full context and let the model handle it? If this works well, it could cut my AI bill in half. Upvoted.

Report

2mo ago

Edgee

Maker

@olivier_jury yep, the compression is made on the full conversation, at each request… and really fast ;)

Report

2mo ago

@sachamorard Upvoted! The rate limit issue in Claude Code is real, and the automatic fallback with context compression is exactly what was needed.

Looking forward to testing this on a large project to see if it holds up.

Report

2mo ago

Foyer

The auto-fallback when rate limits kick in is the part I always end up wiring by hand. Good luck with the launch!

Report

2mo ago

Edgee

Maker

@fberrez1 thank you very much. We had this problem in the team, so we fixed it ;)

Report

2mo ago

Kilo Code

Hunter

We had this problem in the team, so we fixed it

@sachamorard love it!

Report

2mo ago

AISA AI Skills Test

smart approach to a real pain point. the rate limiting on Claude Code during peak hours has killed my flow more times than id like to admit. curious how the token compression affects output quality though — does it handle long context windows well or is there a tradeoff with the 50% savings?

Report

2mo ago

Edgee

Maker

Hello @ozandag. Good question. Token compression is deterministic and lossless. As it removes pollution, it does not alter the LLM result. But you know, the best way to check it is to try ;)

Report

2mo ago

The transparent proxy approach here is clever. Intercepting at the API layer means zero client changes, and that matters. We've burned time at RetainSure debugging failures partway through a session when Claude's rate limits kicked in at the worst moments. How do you normalize tool_use schemas across models? Claude's format doesn't map cleanly to Qwen or Gemma, and that mismatch can quietly degrade agent output.

Report

2mo ago

Edgee

Maker

@anand_thakkar1 you pinpointed one of the most complicated part of our job. Translating an Anthropic request into a Qwen/Kimi… compatible semantic, that’s our secret sauce. Sorry, I have to keep it secret 🤫

Report

2mo ago

Mailwarm

Congrats on the launch!! This solves a real issue for developers who can’t afford downtime when Claude is rate limited or down. Keeping coding running with simple fallback models will make workflow feel more stable.

Report

2mo ago

Fallback models for Claude Code is exactly what's needed hitting a rate limit mid-task and losing context is painful. Does it maintain the full context when switching models or does the fallback start fresh?

Report

2mo ago

@imad_elkhafi we do maintain the context ! otherwise the feature would be less useful !

Report

2mo ago

The fallback angle is practical for agent workflows, especially when a coding session is mid-task and the provider limit hits. I’d be curious how you surface model switches in logs, since silent fallbacks can make debugging output differences harder.

Report

2mo ago

Sipcode

Hey Sacha, went through Edgee Fallback's page and the "your Claude Code session shouldn't die when Anthropic goes down" framing is exactly the pain I've been living with this month. one thing I wanted to ask, when you fall back to Kimi or GLM mid-session, are you replaying the full context or doing a smarter summarization handoff? the model switch is the part I'd want to understand for long sessions.

Report

2mo ago

Interesting - the pain point is real: coding agents are now operational dependencies, so provider limits and outages become workflow risk. The part I'm interested in is not whether the session keeps running, but whether the fallback path preserves intent, tool-use behaviour, and 'reviewability'. A model switch that silently changes judgement would be worse than a hard stop unless teams have good evals and clear evidence around what changed. I need to give this a proper try!

Report

2mo ago