I built takk/modelchain because picking the right LLM per request — and re-implementing streaming, tool calling and failover for every provider — kept leaking into every call site.

It sits between your app and any number of providers (OpenAI, Anthropic, Gemini, or any OpenAI-compatible HTTP endpoint). You declare a pool with cost and keys; it routes each request via 7 strategies (cost-first, quality-first, cost-then-quality, latency-first, weighted, round-robin, sequential-fallback), streams a normalised AsyncIterable, normalises tool calls across providers, and enforces hard budget ceilings.

The non-obvious part: it measures every response with pluggable scorers and feeds that score back into the next routing decision, so the pool adapts as providers ship new models — not a static rule table.

Proof: zero runtime dependencies, 182 tests across 12 suites passing under Vitest 4, 76% line coverage, 5.6 KB brotli core, published with SLSA provenance. A golden routing suite locks every strategy's decision as part of the SemVer contract.

Try the CLI proxy: npx takk/modelchain start --port 8788

Mental model is Prisma, not LangChain. Feedback and prior-art pointers welcome.

modelchain

Cost-aware LLM router with streaming and tool calls

Cost-aware LLM router with streaming and tool calls