Edgee Fallback Models - Claude Code that never stops
Your Claude Code session shouldn't die when Anthropic goes down or your plan runs out. Edgee Fallback Models keeps coding assistants running by routing to alternative models like Kimi K2.6, Gemma, GLM, or Qwen when Claude is unavailable, rate-limited, or just too expensive. Or one-click fallback to your own Bedrock, Vertex, or Azure account. Same Claude Code, different backend, zero code changes. Built for teams that can't afford to stop shipping.


Replies
Edgee
Hey friends
Sacha here, founder of @Edgee .
Two weeks ago Anthropic announced that starting June 15, your programmatic Claude usage gets capped at a $20-$200 monthly credit pool. For heavy Claude Code users, that's roughly a 25 to 40x cut in effective inference.
Same with Copilot that is moving to usage-based pricing June 1st.
A lot of people are angry about it. I get it. But we're builders, and the right answer to a market change is to ship better tools, not to complain.
We started building Fallback Models the week before Anthropic's announcement, after one too many Anthropic outages. The timing is now coincidentally perfect.
Here's what our Fallback Models feature does:
→ Anthropic down? Route to Kimi K2.6, GLM, Qwen, Gemma, or others.
→ Plan limit hit? Same thing, automatically.
→ Want to route always? Pick your model.
You can also fall back to your own Bedrock, Vertex, or Azure account in one click. Same Claude Code on top, your cloud underneath, zero code changes.
And it works the same with Copilot, Codex...
How it fits with our other features:
- Compression: use fewer tokens
- Teams: see who uses tokens and on what
- Fallback Models: keep working when your primary model can't
Fallback Models ships with our Team plan. The compression engine that powers all of it is free to try, no credit card.
Two questions for you:
- Which fallback models would you actually want to use?
- What other failure modes should your coding assistant handle?
Will be in comments all day 🙏
edgee.ai/fallback-models
Tabstack by Mozilla
@sachamorard bravo for this new launch - keep up the great work, keep launching
Edgee
@sachamorard Upvoted! The rate limit issue in Claude Code is real, and the automatic fallback with context compression is exactly what was needed.
Looking forward to testing this on a large project to see if it holds up.
RiteKit Company Logo API
Can we set the sequense of fallbacks? See, I'd love to give you a sequence of the LLMs I don't pay for and then, last resort, OpenAI and Grok can squeeze the last of my life blood out of me. Thanks.
Edgee
@osakasaul Fallbacks are indeed Chainable yes ! Do you have a quick idea of which LLMs you might talking about ?
RiteKit Company Logo API
@nicolasgirdt Probably about 3-4 you list on the site, and grok, gemini, chatGPT at the end, since like clude, I pay through the nose for them. Right now, optimizing with claude code first, we'll see how it goes.
The auto-fallback when rate limits kick in is the part I always end up wiring by hand. Good luck with the launch!
Edgee
Tabstack by Mozilla
@sachamorard love it!
AISA AI Skills Test
smart approach to a real pain point. the rate limiting on Claude Code during peak hours has killed my flow more times than id like to admit. curious how the token compression affects output quality though — does it handle long context windows well or is there a tradeoff with the 50% savings?
Edgee
The transparent proxy approach here is clever. Intercepting at the API layer means zero client changes, and that matters. We've burned time at RetainSure debugging failures partway through a session when Claude's rate limits kicked in at the worst moments. How do you normalize tool_use schemas across models? Claude's format doesn't map cleanly to Qwen or Gemma, and that mismatch can quietly degrade agent output.
Edgee
Mailwarm
Congrats on the launch!! This solves a real issue for developers who can’t afford downtime when Claude is rate limited or down. Keeping coding running with simple fallback models will make workflow feel more stable.
Fallback models for Claude Code is exactly what's needed hitting a rate limit mid-task and losing context is painful. Does it maintain the full context when switching models or does the fallback start fresh?
@imad_elkhafi we do maintain the context ! otherwise the feature would be less useful !
The fallback angle is practical for agent workflows, especially when a coding session is mid-task and the provider limit hits. I’d be curious how you surface model switches in logs, since silent fallbacks can make debugging output differences harder.
Hey Sacha, went through Edgee Fallback's page and the "your Claude Code session shouldn't die when Anthropic goes down" framing is exactly the pain I've been living with this month. one thing I wanted to ask, when you fall back to Kimi or GLM mid-session, are you replaying the full context or doing a smarter summarization handoff? the model switch is the part I'd want to understand for long sessions.