Qwen3-VL is the new flagship vision-language model from the Qwen team, excelling at visual agent tasks, long-video understanding, and spatial reasoning with a native 256K context window.
First up is Qwen3-VL, their new flagship vision-language model. Its visual agent capabilities are a huge leap, it can actually operate GUIs on phones and PCs. It also has strong visual coding skills, turning mockups into real code. And the 256K context (expandable to 1M) means it can process things like 2-hour long videos.
They also dropped Qwen3-Max, their new flagship text model with supercharged coding and agent skills.
On top of all that, the release also includes an upgraded Qwen3-Coder, a real-time translator, and a new safety model series. With these releases, the Qwen3 series has basically reached its peak in both multimodal capabilities and intelligence.
Replies
Flowtica Scribe
Hi everyone!
Qwen is cooking!
First up is Qwen3-VL, their new flagship vision-language model. Its visual agent capabilities are a huge leap, it can actually operate GUIs on phones and PCs. It also has strong visual coding skills, turning mockups into real code. And the 256K context (expandable to 1M) means it can process things like 2-hour long videos.
They also dropped Qwen3-Max, their new flagship text model with supercharged coding and agent skills.
On top of all that, the release also includes an upgraded Qwen3-Coder, a real-time translator, and a new safety model series. With these releases, the Qwen3 series has basically reached its peak in both multimodal capabilities and intelligence.