Ollama v0.19 rebuilds Apple Silicon inference on top of MLX, bringing much faster local performance for coding and agent workflows. It also adds NVFP4 support and smarter cache reuse, snapshots, and eviction for more responsive sessions.
Ollama's new official desktop app for macOS and Windows makes it easy to run open-source models locally. Chat with LLMs, use multimodal models with images, or reason about files, all from a simple, private interface.
Ollama v0.7 introduces a new engine for first-class multimodal AI, starting with vision models like Llama 4 & Gemma 3. Offers improved reliability, accuracy, and memory management for running LLMs locally.