LLaVA-Mini
LLaVA-Mini:Efficient Image and Video Large Multimodal Models
9 followers
LLaVA-Mini:Efficient Image and Video Large Multimodal Models
9 followers
LLaVA-Mini👏is an efficient LMM for image/video understanding using 1 vision token, offering: (1)⏩fast response (40ms per image) (2)🖥️less VRAM usage (support 3-hour video understanding on 24GB GPU).

Free
Launch Team

Wispr Flow: Dictation That Works Everywhere — Stop typing. Start speaking. 4x faster.
Stop typing. Start speaking. 4x faster.
Promoted
Hunter
📌Report
