Launching today

Vision Banana From Google DeepMind
Image Generators are Generalist Vision Learners
2 followers
Image Generators are Generalist Vision Learners
2 followers
Unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV.






Hey Hunters π
Excited to share something really interesting from Google DeepMind β Vision Banana π.
Itβs a new kind of vision model that flips the usual approach. Instead of building separate models for different vision tasks, it treats everything as image generation.
π The idea is simple but powerful:
All outputs are represented as RGB images, and everything is controlled through text prompts.
What makes it stand out:
β’ Works across both 2D and 3D vision tasks
β’ Achieves strong zero-shot performance
β’ No task-specific heads or complex training tricks
And the surprising part?
It still keeps its original image generation ability while handling advanced vision tasks.
This shows a bigger shift happening β
π Image generation might become the universal interface for computer vision.
Curious to hear your thoughts β is this the future of CV? π