Hey Hunters 👋

Excited to share something really interesting from Google DeepMind — Vision Banana 🍌.

It’s a new kind of vision model that flips the usual approach. Instead of building separate models for different vision tasks, it treats everything as image generation.

👉 The idea is simple but powerful:
All outputs are represented as RGB images, and everything is controlled through text prompts.

What makes it stand out:
• Works across both 2D and 3D vision tasks
• Achieves strong zero-shot performance
• No task-specific heads or complex training tricks

And the surprising part?
It still keeps its original image generation ability while handling advanced vision tasks.

This shows a bigger shift happening —

👉 Image generation might become the universal interface for computer vision.

Curious to hear your thoughts — is this the future of CV? 🚀