How are you handling model selection per step in production workflows?

LLM pricing volatility has made single-model dependence a real operational risk. Multi-model orchestration is quickly becoming a baseline expectation, not a differentiator.

Here's a concrete example from a recent banking deployment:

Step 1 - Document intake: Llama 3 (self-hosted) reads and classifies KYC documents. PII never leaves the network.

Step 2 - Reasoning over a 200-page credit history: Claude via a controlled gateway, for nuanced multi-document analysis.

Step 3 - Customer-facing summary generation: GPT-4o for tone and speed.

Step 4 - Audit logging: every step records which model was used, what inputs went in, and what came back. A compliance reviewer can replay any decision.

On this specific workflow, LLM cost dropped roughly 40% compared to routing everything through a single frontier model.

How are other teams here approaching this? Manual config per step, dynamic routing based on task type, or something in between?

1 view

How are you handling model selection per step in production workflows?

Replies