A family of SOTA speech models (0.6B & 1.7B) supporting 10 languages. Features prompt-based Voice Design, 3s zero-shot cloning, and extreme low-latency streaming.
Qwen3-Coder is a new 480B MoE open model (35B active) by the Qwen team, built for agentic coding. It achieves SOTA results on benchmarks like SWE-bench, supports up to 1M context, and comes with an open-source CLI tool, Qwen Code.
Qwen3-235B-A22B-Thinking-2507 is a powerful open-source MoE model (22B active) built for deep reasoning. It achieves SOTA results on agentic tasks, supports a 256K context, and is available on Hugging Face and via API.
Qwen-Image is a new 20B open-source image foundation model by the Qwen team. It excels at complex text rendering (especially Chinese) and precise image editing, while also delivering strong general image generation. Available now in Qwen Chat.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Qwen-Image-Edit is the editing version of the 20B Qwen-Image model. It offers precise, model-native editing, including bilingual text modification and both high-level semantic and low-level appearance changes.
Qwen3-VL is the new flagship vision-language model from the Qwen team, excelling at visual agent tasks, long-video understanding, and spatial reasoning with a native 256K context window.
Qwen3-Next is a new family of models from the Qwen team, featuring a novel architecture that activates just 3B of its 80B parameters. This delivers performance comparable to much larger models with a >10x speedup, especially on long-context tasks.
Qwen3-ASR is a new high-accuracy speech recognition model. It supports 11 languages, excels at transcribing songs with background music, and features a unique contextual biasing system that accepts any text format to improve accuracy on specific terms.