Qwen3.5 - The 397B native multimodal agent with 17B active params

Flowtica Scribe

•3mo ago

An open-weight, native vision-language model built for long-horizon agentic tasks. Its hybrid architecture (linear attention + MoE) delivers the capabilities of a 397B giant with the inference speed of a 17B model.

Replies

Best

Flowtica Scribe

Hunter

📌

Hi everyone!

Qwen3.5 is here. It is a native vision-language model with a massive 397B parameter count.

Built on the Qwen3-Next architecture (Linear Attention + MoE), only 17B parameters are active per forward pass. This hits a specific sweet spot: you get the reasoning depth of a giant model with the inference latency of a much smaller one.

For applications, this efficiency is key for agents.

It is natively multimodal with no glued-on vision adapters, demonstrating outstanding results on agentic tasks. This means handling complex workflows without burning through tokens.

Apache 2.0 and ready for vLLM/SGLang out of the box!

Report

3mo ago

Fluent

Congrats @zaczuo !

Excited to test it against agentic workflows. Being a fan of Qwen3 – always a rock solid choice as a local model.

Report

3mo ago

Serving a 397B MoE native multimodal model for long-horizon agents will bottleneck on KV-cache growth and multimodal prefill latency, and expert-routing variance can reduce batching efficiency at high throughput. Best practice: run it under vLLM or SGLang with continuous batching plus paged KV cache, add aggressive prompt and image embedding caching, and lean on FP8 where supported to keep cost predictable. :contentReference[oaicite:0]{index=0} Question: what max context length are you targeting for Qwen3.5 in production and how stable is expert routing under long tool-using trajectories when served via vLLM or SGLang?

Report

3mo ago

397B with only 17B active params is impressive efficiency. The hybrid linear attention + MoE approach seems like the right direction for long-horizon agentic tasks. As someone building a vision AI app for pet health, I'm always watching open-weight multimodal models closely — excited to benchmark this against our current pipeline. Congrats on the release!

Report

3mo ago

The 17B active params with that level of capability is impressive — efficiency like this is what actually makes real-world agent use practical, not just demos.

Report

3mo ago