Nemotron 3 Ultra by NVIDIA - Powers faster, efficient reasoning for long-running agents
by•
A 550B MoE frontier-intelligence open model built for long-running agents.
It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.Ultra excels at complex tasks like coding and deep research.
Long-running agents spend their time planning, using tools, recovering from failures, and deciding what to do next.

Replies
NVIDIA just shipped Nemotron 3 Ultra, a 550B open frontier model purpose-built for long-running AI agents.
Most frontier reasoning models are optimised for single-turn accuracy. Agentic tasks are different: agents plan, call tools, delegate to sub-agents, handle failures, and pass history back into the model across many turns. As sessions get longer, token costs compound and models start losing the thread.
Nemotron 3 Ultra addresses this with a hybrid Mamba-Transformer architecture that handles long-context sequences without losing recall, and NVFP4 quantisation that delivers 5x higher throughput per GPU compared to BF16 on Blackwell.
Here's what ships:
550B total / 55B active parameters via LatentMoE so you get frontier reasoning without activating the full model on every token
Up to 1M token context window handles large codebases, long tool-call chains, and multi-document synthesis natively
Multi-token prediction layers reduces generation time on long outputs and multi-turn workflows
Post-trained for OpenClaw, Hermes Agent, and LangChain Deep Agents accurate across agent harnesses, not just chat benchmarks
Multi-Teacher On-Policy Distillation trained with dense feedback from 10+ domain-specific teacher models across code, math, and tool use
Fully open weights, synthetic training data, and post-training recipes all released under OpenMDW-1.1
P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified → @rohanrecommends