Launching today

modelchain
Cost-aware LLM router with streaming and tool calls
3 followers
Cost-aware LLM router with streaming and tool calls
3 followers
Drop-in, zero-dependency LLM router that routes prompts across OpenAI, Anthropic, Gemini and any HTTP endpoint by cost, latency and observed quality.

I built takk/modelchain because picking the right LLM per request — and re-implementing streaming, tool calling and failover for every provider — kept leaking into every call site.
It sits between your app and any number of providers (OpenAI, Anthropic, Gemini, or any OpenAI-compatible HTTP endpoint). You declare a pool with cost and keys; it routes each request via 7 strategies (cost-first, quality-first, cost-then-quality, latency-first, weighted, round-robin, sequential-fallback), streams a normalised AsyncIterable, normalises tool calls across providers, and enforces hard budget ceilings.
The non-obvious part: it measures every response with pluggable scorers and feeds that score back into the next routing decision, so the pool adapts as providers ship new models — not a static rule table.
Proof: zero runtime dependencies, 182 tests across 12 suites passing under Vitest 4, 76% line coverage, 5.6 KB brotli core, published with SLSA provenance. A golden routing suite locks every strategy's decision as part of the SemVer contract.
Try the CLI proxy: npx takk/modelchain start --port 8788
Mental model is Prisma, not LangChain. Feedback and prior-art pointers welcome.