OK, really excited about this one because it takes a huge step forward in visual context.
Tested it by asking it to find all the red dots in an image. Instead of trying to "eyeball" it (which models usually fail at), Gemini 3 Flash realized that "counting by eye" is imprecise. So it decided to act like an engineer and write a professional OpenCV script to solve it accurately.
The logic flow was fascinating:
Task: Precision counting.
Reasoning: Visual models have error margins -> I should use Python tools.
Action: Filter pixels via HSV color space -> Use findContours to locate them.
This actually blew my mind. Natively realizing the "Perception - Reasoning - Action" loop in vision is critical for real-world apps.
The demos in Google AI Studio are also worth checking out. Definitely some of the most interesting and inspiring visual use cases I've seen.
Report
Impressive direction. The real value here isn’t just “bigger model,” but how naturally Gemini works across modalities. If the text–image–code handoff feels truly seamless in real-world workflows, this could change how people actually use AI day to day — not just experiment with it.
Replies
Flowtica Scribe
Hi everyone!
OK, really excited about this one because it takes a huge step forward in visual context.
Tested it by asking it to find all the red dots in an image. Instead of trying to "eyeball" it (which models usually fail at), Gemini 3 Flash realized that "counting by eye" is imprecise. So it decided to act like an engineer and write a professional OpenCV script to solve it accurately.
The logic flow was fascinating:
Task: Precision counting.
Reasoning: Visual models have error margins -> I should use Python tools.
Action: Filter pixels via HSV color space -> Use findContours to locate them.
This actually blew my mind. Natively realizing the "Perception - Reasoning - Action" loop in vision is critical for real-world apps.
The demos in Google AI Studio are also worth checking out. Definitely some of the most interesting and inspiring visual use cases I've seen.
Impressive direction. The real value here isn’t just “bigger model,” but how naturally Gemini works across modalities. If the text–image–code handoff feels truly seamless in real-world workflows, this could change how people actually use AI day to day — not just experiment with it.