How are you handling model selection per step in production workflows?
LLM pricing volatility has made single-model dependence a real operational risk. Multi-model orchestration is quickly becoming a baseline expectation, not a differentiator.
Here's a concrete example from a recent banking deployment:
Step 1 - Document intake: Llama 3 (self-hosted) reads and classifies KYC documents. PII never leaves the network.
Step 2 - Reasoning over a 200-page credit history: Claude via a controlled gateway, for nuanced multi-document analysis.
Step 3 - Customer-facing summary generation: GPT-4o for tone and speed.
Step 4 - Audit logging: every step records which model was used, what inputs went in, and what came back. A compliance reviewer can replay any decision.
On this specific workflow, LLM cost dropped roughly 40% compared to routing everything through a single frontier model.
How are other teams here approaching this? Manual config per step, dynamic routing based on task type, or something in between?

Replies