Qwen-Image-2512 is the new open-source SOTA for text-to-image generation. It delivers drastically improved photorealism, finer natural details, and superior text rendering.
Qwen-Image-Layered decomposes images into transparent RGBA layers, unlocking inherent editability. You can move, resize, or delete objects without artifacts. Supports recursive decomposition and variable layer counts.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Qwen3-VL is the new flagship vision-language model from the Qwen team, excelling at visual agent tasks, long-video understanding, and spatial reasoning with a native 256K context window.
Qwen3-Next is a new family of models from the Qwen team, featuring a novel architecture that activates just 3B of its 80B parameters. This delivers performance comparable to much larger models with a >10x speedup, especially on long-context tasks.
Qwen3-ASR is a new high-accuracy speech recognition model. It supports 11 languages, excels at transcribing songs with background music, and features a unique contextual biasing system that accepts any text format to improve accuracy on specific terms.
Qwen-Image-Edit is the editing version of the 20B Qwen-Image model. It offers precise, model-native editing, including bilingual text modification and both high-level semantic and low-level appearance changes.
Qwen3-Coder is a new 480B MoE open model (35B active) by the Qwen team, built for agentic coding. It achieves SOTA results on benchmarks like SWE-bench, supports up to 1M context, and comes with an open-source CLI tool, Qwen Code.
Qwen-Image is a new 20B open-source image foundation model by the Qwen team. It excels at complex text rendering (especially Chinese) and precise image editing, while also delivering strong general image generation. Available now in Qwen Chat.