Deepseek-VL2

Deepseek-VL2

MoE Vision-Language, Now Easier to Access

582 followers

DeepSeek-VL2 are open-source vision-language models with strong multimodal understanding, powered by an efficient MoE architecture. Easily test them out with the new Hugging Face demo.
 DeepSeek-VL2 gallery image
 DeepSeek-VL2 gallery image
 DeepSeek-VL2 gallery image
Launch Team
AssemblyAI
AssemblyAI
Build voice AI apps with a single API
Promoted

What do you think? …

Zac Zuo
DeepSeek made waves with their R1 language model, but their multimodal capabilities (especially image understanding) are not good enough: But, they are rapidly evolving. DeepSeek-VL2, their new open-source family of Mixture-of-Experts (MoE) vision-language models, is a big step forward, achieving strong performance with a much smaller activated parameter count, thanks to its MoE design. And the exciting news: there is a new Hugging Face Spaces demo – you can now try these models without needing to deploy heavily (normally you would need more than 80GB of GPU resources, which is almost impossible for most of us) So check it out and see what DeepSeek brings next to suprise everyone :)
Tan Hang
@zac_zuo It's still going on, I'm optimistic about it
Jim Engine
Awesome, that's good news, I did not like the "No text extracted" error when i tried to upload images of any kind...
Shivam Singh
WoW! The speed at which DeepSeek is evolving is really mindblowing. Congrats on the launch and sending wins to the team :)
Q
Great models from DeepSeek! Evolving fast, great value for us!
Otman Alami
For transparency in performance you should add Qwen 2.5 VL 7B as well
Ray Wang
Deepseek's greatest value is that R1 cost nothing compared to Chatgpt o1; There are still no good use cases for "middle detail" vision models yet (Good enough to identify things in pictures but not good enough to drive a car etc.) Awesome release though.
Tanmay Parekh
All the best for the launch @zac_zuo!
123
Next
Last