Gemini Robotics ER 1.6 - Google's SOTA robotics model for visual & spatial reasoning!
by•
Gemini Robotics-ER 1.6 is a vision-language model for robot reasoning.
It handles spatial pointing, multi-view success detection, and instrument reading.
For robotics engineers and developers building physical agents via the Gemini API.

Replies
Gemini Robotics-ER 1.6 is the reasoning layer that lets robots like Boston Dynamics' Spot read analog gauges, count objects, and confirm when a task is actually done. Available now via the Gemini API.
I'm hunting this because there's a gap between "robot that follows instructions" and "robot that reasons about what it sees". That gap is exactly where industrial automation keeps getting stuck. ER 1.6 directly bridges that gap.
The problem: Most robot AI can execute. Very few can verify. Knowing when a task succeeded, reading a pressure dial in a poorly lit facility, or identifying the correct object among 40 similar ones requires embodied reasoning, not just vision.
The solution: A vision-language model that handles pointing, spatial counting, multi-view success detection, and instrument reading as first-class capabilities. It can call tools natively and chain reasoning steps to solve complex physical tasks.
Key capabilities:
Spatial pointing: detect objects, map paths, find grasp points
Success detection: confirm tasks across multiple camera views
Instrument reading: read gauges, sight glasses, digital displays (93% accuracy)
Agentic tools: integrate Google Search, VLA models, custom functions
Safety constraints: respects material and weight limits
Who it's for: Robotics engineers, hardware AI teams, and developers building autonomous inspection or manipulation systems. Especially useful if you are integrating AI reasoning into industrial or field robotics.
P.S. I hunt the latest and greatest launches in tech, SaaS and AI, follow to be notified → @rohanrecommends
@rohanrecommends How does ER 1.6 handle edge cases like rusty/dirty gauges in real factories, and what's the latency like on Spot for chaining those reasoning steps?
The spatial reasoning piece is what makes this interesting. That's been the hard problem for physical AI for a long time