0.8B-9B native multimodal w/ more intelligence, less compute
Qwen just released the Qwen3.5 Small Model Series — 0.8B, 2B, 4B and 9B. Native multimodal with improved architecture and scaled RL. 0.8B and 2B are tiny and fast for edge devices, 4B makes a strong lightweight agent base, and 9B is already closing the gap with much larger models. Base versions released too.
The Qwen team just dropped the Qwen3.5 Small Model Series: 0.8B, 2B, 4B, and 9B, along with their Base versions.
This release fills the missing piece for on-device deployment and completes the full Qwen3.5 matrix from 0.8B all the way to 397B. Now you have clear choices:
@zaczuo Impressive release! Already played with all the small series both locally (MLX) and in the cloud. Now that's something that can be reliably and constantly used in agentic workflows!
Report
Native multimodal at 0.8B is genuinely impressive - most teams trade off size for capability, but 262K context windows + text/image/video in under 1B parameters changes the edge deployment math.
The 9B beating GPT-OSS-20B on GPQA Diamond is interesting. Curious about structured output reliability at 0.8B though - small models tend to drop JSON schema adherence under complex instructions. Is there a differentiated training approach for structured data tasks?
MTP for faster inference on constrained hardware is a smart addition. Real-world throughput numbers on consumer GPU vs Apple Silicon would help developers size their deployment targets.
Report
The 9B punching above its weight class is the real story here. Running capable models locally without needing a data center changes what's possible for privacy-conscious apps and edge deployments. Been waiting for small open-source models to close this gap.
Report
The benchmarks on the 9B model are seriously wild - matching models 13x its size is no small feat, and having the full range from 0.8B to 9B gives developers real flexibility for edge deployment. Curious though: how does the multimodal performance hold up on the smaller 0.8B and 2B variants compared to the 9B?
Report
Really cool to see Qwen3 pushing open-source AI forward; the hybrid reasoning + fast response approach is super interesting. Curious to know, what kinds of real-world applications or agents are you most excited to see people build with it?
Great launch! Qwen has been incredibly useful, especially when I reach a point where other AI services can no longer technically deliver what I need. I’m also excited to see it matching the “big players” in benchmark results. 2026 is shaping up to be very interesting.
I’ve been using Qwen for building a simple code and website generator, and it works really well for fast iterations. Great for prototyping and lightweight generation.
What needs improvement
I need more on the history pages, a section when we can re-edit the input/process/output with easy UX. Basically, better handling of edge cases without extra prompting
vs Alternatives
I choose Qwen because it’s fast, lightweight, and great for turning ideas into simple, working code or websites. It was also the first web-based tool I explored for code generation, which made it easy to start prototyping right away.
How accurate is Qwen3 on real coding tasks you tried?
Quite good, but still need some touch-up especially on the logic.
Does Qwen3-Coder reduce PR review time or defects?
The new Qwen 3.5 small models are incredible on iPhone and iPad devices, packing a lot of intelligence in a small package. Amazing work by the Qwen team at Alibaba.
Flowtica Scribe
Hi everyone!
The Qwen team just dropped the Qwen3.5 Small Model Series: 0.8B, 2B, 4B, and 9B, along with their Base versions.
This release fills the missing piece for on-device deployment and completes the full Qwen3.5 matrix from 0.8B all the way to 397B. Now you have clear choices:
0.8B/2B for embedded/IoT/Mobile
4B for lightweight multimodal agents
9B for edge servers
Plus the bigger MoE models for heavier workloads.
The 9B is the real shocker: matching or beating GPT-OSS-120B on several key benchmarks while being 13x smaller.
Even Elon chimed in:
Edge AI is heating up fast. This opens up exciting new opportunities for AI hardware and local innovation.
Play with these models on @Ollama!
Fluent
@zaczuo Impressive release! Already played with all the small series both locally (MLX) and in the cloud. Now that's something that can be reliably and constantly used in agentic workflows!
Native multimodal at 0.8B is genuinely impressive - most teams trade off size for capability, but 262K context windows + text/image/video in under 1B parameters changes the edge deployment math.
The 9B beating GPT-OSS-20B on GPQA Diamond is interesting. Curious about structured output reliability at 0.8B though - small models tend to drop JSON schema adherence under complex instructions. Is there a differentiated training approach for structured data tasks?
MTP for faster inference on constrained hardware is a smart addition. Real-world throughput numbers on consumer GPU vs Apple Silicon would help developers size their deployment targets.
The 9B punching above its weight class is the real story here. Running capable models locally without needing a data center changes what's possible for privacy-conscious apps and edge deployments. Been waiting for small open-source models to close this gap.
The benchmarks on the 9B model are seriously wild - matching models 13x its size is no small feat, and having the full range from 0.8B to 9B gives developers real flexibility for edge deployment. Curious though: how does the multimodal performance hold up on the smaller 0.8B and 2B variants compared to the 9B?
Really cool to see Qwen3 pushing open-source AI forward; the hybrid reasoning + fast response approach is super interesting. Curious to know, what kinds of real-world applications or agents are you most excited to see people build with it?
@binyuan_hui @chen_cheng1 @junyang_lin Hi guys. Can non-technical developers use these models easily? What tooling or platforms do they support?
Flowtica Scribe
@kimberly_ross Try them on Locally AI :)
What kinds of tasks can be performed with these models?