Ollama is what I reach for when I want a model running locally in under a minute, and that matters more than people admit. The one-line pulls and the OpenAI-compatible endpoint mean I can drop it under existing code with almost no changes. Where it still bites me: concurrent requests get serialized in ways that aren't obvious until you load-test, and juggling VRAM across a few models gets fiddly. For local dev and prototyping though, nothing else is this frictionless.