Agentic Vision in Gemini - Agentic visual reasoning with code execution

Flowtica Scribe

•20d ago

Agentic Vision, a new capability introduced in Gemini 3 Flash, converts image understanding from a static act into an agentic process

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone!

OK, really excited about this one because it takes a huge step forward in visual context.

Tested it by asking it to find all the red dots in an image. Instead of trying to "eyeball" it (which models usually fail at), Gemini 3 Flash realized that "counting by eye" is imprecise. So it decided to act like an engineer and write a professional OpenCV script to solve it accurately.

The logic flow was fascinating:

Task: Precision counting.
Reasoning: Visual models have error margins -> I should use Python tools.
Action: Filter pixels via HSV color space -> Use findContours to locate them.

This actually blew my mind. Natively realizing the "Perception - Reasoning - Action" loop in vision is critical for real-world apps.

The demos in Google AI Studio are also worth checking out. Definitely some of the most interesting and inspiring visual use cases I've seen.

Report

21d ago

Impressive direction. The real value here isn’t just “bigger model,” but how naturally Gemini works across modalities. If the text–image–code handoff feels truly seamless in real-world workflows, this could change how people actually use AI day to day — not just experiment with it.

Report

19d ago