fmerian

Edgee Fallback Models - Claude Code that never stops

Your Claude Code session shouldn't die when Anthropic goes down or your plan runs out. Edgee Fallback Models keeps coding assistants running by routing to alternative models like Kimi K2.6, Gemma, GLM, or Qwen when Claude is unavailable, rate-limited, or just too expensive. Or one-click fallback to your own Bedrock, Vertex, or Azure account. Same Claude Code, different backend, zero code changes. Built for teams that can't afford to stop shipping.

Add a comment

Replies

Best
Sacha MORARD

Hey friends

Sacha here, founder of @Edgee .

Two weeks ago Anthropic announced that starting June 15, your programmatic Claude usage gets capped at a $20-$200 monthly credit pool. For heavy Claude Code users, that's roughly a 25 to 40x cut in effective inference.
Same with Copilot that is moving to usage-based pricing June 1st.

A lot of people are angry about it. I get it. But we're builders, and the right answer to a market change is to ship better tools, not to complain.

We started building Fallback Models the week before Anthropic's announcement, after one too many Anthropic outages. The timing is now coincidentally perfect.

Here's what our Fallback Models feature does:

→ Anthropic down? Route to Kimi K2.6, GLM, Qwen, Gemma, or others.

→ Plan limit hit? Same thing, automatically.

→ Want to route always? Pick your model.

You can also fall back to your own Bedrock, Vertex, or Azure account in one click. Same Claude Code on top, your cloud underneath, zero code changes.

And it works the same with Copilot, Codex...

How it fits with our other features:

- Compression: use fewer tokens

- Teams: see who uses tokens and on what

- Fallback Models: keep working when your primary model can't

Fallback Models ships with our Team plan. The compression engine that powers all of it is free to try, no credit card.

Two questions for you:

- Which fallback models would you actually want to use?

- What other failure modes should your coding assistant handle?

Will be in comments all day 🙏

edgee.ai/fallback-models

fmerian

@sachamorard bravo for this new launch - keep up the great work, keep launching

Olivier Jury
@fmerian Love the "compress" part. Most fallback tools just switch models and you lose half the context. Do you recompress the conversation history before sending to Kimi/Qwen, or do you keep the full context and let the model handle it? If this works well, it could cut my AI bill in half. Upvoted.
Sacha MORARD
@olivier_jury yep, the compression is made on the full conversation, at each request… and really fast ;)
Olivier Jury

@sachamorard Upvoted! The rate limit issue in Claude Code is real, and the automatic fallback with context compression is exactly what was needed.

Looking forward to testing this on a large project to see if it holds up.

Saul Fleischman

Can we set the sequense of fallbacks? See, I'd love to give you a sequence of the LLMs I don't pay for and then, last resort, OpenAI and Grok can squeeze the last of my life blood out of me. Thanks.

Nicolas Girardot

@osakasaul Fallbacks are indeed Chainable yes ! Do you have a quick idea of which LLMs you might talking about ?

Saul Fleischman

@nicolasgirdt Probably about 3-4 you list on the site, and grok, gemini, chatGPT at the end, since like clude, I pay through the nose for them. Right now, optimizing with claude code first, we'll see how it goes.

Florent Berrez

The auto-fallback when rate limits kick in is the part I always end up wiring by hand. Good luck with the launch!

Sacha MORARD
@fberrez1 thank you very much. We had this problem in the team, so we fixed it ;)
fmerian

We had this problem in the team, so we fixed it

@sachamorard love it!

Ozan

smart approach to a real pain point. the rate limiting on Claude Code during peak hours has killed my flow more times than id like to admit. curious how the token compression affects output quality though — does it handle long context windows well or is there a tradeoff with the 50% savings?

Sacha MORARD
Hello @ozandag. Good question. Token compression is deterministic and lossless. As it removes pollution, it does not alter the LLM result. But you know, the best way to check it is to try ;)
Anand Thakkar

The transparent proxy approach here is clever. Intercepting at the API layer means zero client changes, and that matters. We've burned time at RetainSure debugging failures partway through a session when Claude's rate limits kicked in at the worst moments. How do you normalize tool_use schemas across models? Claude's format doesn't map cleanly to Qwen or Gemma, and that mismatch can quietly degrade agent output.

Sacha MORARD
@anand_thakkar1 you pinpointed one of the most complicated part of our job. Translating an Anthropic request into a Qwen/Kimi… compatible semantic, that’s our secret sauce. Sorry, I have to keep it secret 🤫
Thami Benjelloun

Congrats on the launch!! This solves a real issue for developers who can’t afford downtime when Claude is rate limited or down. Keeping coding running with simple fallback models will make workflow feel more stable.

IMAD EL KHAFI

Fallback models for Claude Code is exactly what's needed hitting a rate limit mid-task and losing context is painful. Does it maintain the full context when switching models or does the fallback start fresh?

Clément Bouvet

@imad_elkhafi we do maintain the context ! otherwise the feature would be less useful !

Jimmy Lee

The fallback angle is practical for agent workflows, especially when a coding session is mid-task and the provider limit hits. I’d be curious how you surface model switches in logs, since silent fallbacks can make debugging output differences harder.

Anuj ojha

Hey Sacha, went through Edgee Fallback's page and the "your Claude Code session shouldn't die when Anthropic goes down" framing is exactly the pain I've been living with this month. one thing I wanted to ask, when you fall back to Kimi or GLM mid-session, are you replaying the full context or doing a smarter summarization handoff? the model switch is the part I'd want to understand for long sessions.

12
Next
Last