Official Z.ai platform to experience our new, MIT-licensed GLM models (Base, Reasoning, Rumination). Simple UI focuses on model interaction. Free.
This is the 10th launch from Z.ai. View more

GLM-5V-Turbo
Launched this week
GLM-5V-Turbo is Z.AI's first multimodal coding model. It understands images, video, files, and UI layouts, then turns that visual context into runnable code, debugging help, and stronger agent workflows with Claude Code and OpenClaw.




Payment Required
Launch Team








Flowtica Scribe
Hi everyone!
GLM-5V-Turbo is one of the more interesting coding model releases lately because it is not just "vision added onto a code model." @Z.ai is clearly positioning it as a native multimodal coding model that can understand screenshots, design drafts, videos, document layouts, and real interfaces, then turn that into code, debugging, and action.
"Seeing the screen and writing the code" is a very real workflow, and GLM-5V is built exactly for that.
It is also deeply adapted for @Claude Code and @OpenClaw style loops, which makes it feel much more relevant than a generic VLM with some coding demos on top.
Try it on chat.z.ai or plug in the official API.
few months ago, @Claude by Anthropic announced Opus 4.5 and we thought they won the AI coding race. then @MiniMax released M2.7, and now GLM-5V-Turbo by @Z.ai.
open source is so back.
pro tip: you can experiment with this new model with @Kilo Code and @KiloClaw
this looks exciting! we struggle with creating vector diagrams that we can embed in website. generally they start with a sketch on paper and now we want to put them on our website. right now the process is very cumbersome. can the model help with sketch-in -> .svg-out ?
The "video → runnable code" claim is the one I want to pull on. Are we talking about screen recordings of a UI workflow, where the model watches what a user does and generates automation code from that? Or is video support more like "static frames extracted and analyzed sequentially"? Those are very different capabilities with very different use cases.
Vision-to-code is a fascinating direction. We use a simpler version of this in Krafl-IO — users upload an image and our AI describes it, then generates a LinkedIn post around it. Going from visual context to structured output is harder than it looks. Curious how GLM-5V handles ambiguous UI elements where the "right" code depends on intent, not just layout.