NVIDIA Nemotron 3 Ultra - The first open frontier model built for agents

by•2mo ago

NVIDIA's 550B Mixture-of-Experts model with hybrid Mamba-Attention architecture, delivering 300+ tokens/sec with a 1M-token context window. Top-ranked US open-weights model on the Artificial Analysis Intelligence Index. Built specifically for multi-step agent loops where frontier reasoning at open-source economics actually matters. Available now on Hugging Face, OpenRouter, ModelScope, and build.nvidia.com as a NIM microservice.

Replies

Best

Hunter

📌

Been waiting on this one since Jensen teased it at Computex. Nemotron 3 Ultra is what I've wanted from an open model — actually fast (300+ tok/s), actually long-context (1M), actually agent-ready. I run a lot of multi-step agent loops for service business clients, and per-call latency on hosted frontier models kills the unit economics. This changes the math. Curious what people are seeing routing through OpenRouter vs running locally via NIM. Would love to hear from the NVIDIA team about the post-training datasets released alongside the weights.

Report

2mo ago