Nemotron 3 Ultra by NVIDIA - Powers faster, efficient reasoning for long-running agents

by•2mo ago

A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.Ultra excels at complex tasks like coding and deep research. Long-running agents spend their time planning, using tools, recovering from failures, and deciding what to do next.

Replies

Best

Hunter

📌

NVIDIA just shipped Nemotron 3 Ultra, a 550B open frontier model purpose-built for long-running AI agents.

Most frontier reasoning models are optimised for single-turn accuracy. Agentic tasks are different: agents plan, call tools, delegate to sub-agents, handle failures, and pass history back into the model across many turns. As sessions get longer, token costs compound and models start losing the thread.

Nemotron 3 Ultra addresses this with a hybrid Mamba-Transformer architecture that handles long-context sequences without losing recall, and NVFP4 quantisation that delivers 5x higher throughput per GPU compared to BF16 on Blackwell.

Here's what ships:

550B total / 55B active parameters via LatentMoE so you get frontier reasoning without activating the full model on every token
Up to 1M token context window handles large codebases, long tool-call chains, and multi-document synthesis natively
Multi-token prediction layers reduces generation time on long outputs and multi-turn workflows
Post-trained for OpenClaw, Hermes Agent, and LangChain Deep Agents accurate across agent harnesses, not just chat benchmarks
Multi-Teacher On-Policy Distillation trained with dense feedback from 10+ domain-specific teacher models across code, math, and tool use
Fully open weights, synthetic training data, and post-training recipes all released under OpenMDW-1.1

P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified → @rohanrecommends

Report

2mo ago

550B params (55B active), 1M context, 300 tok/sec. probably the strongest US open-weights model out there right now - and it's currently available for free on @Kilo Code

ouch.

Report

2mo ago

A lot of frontier models are improving raw reasoning, but context management still feels like a separate bottleneck.

Have you seen longer-horizon agent workloads benefit more from the model improvements themselves, or from better retrieval and memory layers around them?

Report

2mo ago

Goldfish

Big release. What’s interesting to me is less the “bigger context window” headline and more what it means for actual agent runs, where most of the work is planning, tool calls, backtracking, and keeping state over time.

I’m curious how you’re seeing people use Nemotron 3 Ultra alongside retrieval or external memory. With a 1M context window, does that layer become less important, or does it just shift toward deciding what should live in memory vs what gets passed straight into the run?

Report

2mo ago