Vily

Vily

AI Researcher. Building LLMs & LMMs.
All activity
Stream-Omni is an GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across any modality combinations.
Stream-Omni: GPT-4o-like Chatbot
Stream-Omni: GPT-4o-like ChatbotStream-Omni is an end-to-end language-vision-speech chatbot.
LLaVA-Mini👏is an efficient LMM for image/video understanding using 1 vision token, offering: (1)⏩fast response (40ms per image) (2)🖥️less VRAM usage (support 3-hour video understanding on 24GB GPU).
LLaVA-Mini
LLaVA-MiniLLaVA-Mini:Efficient Image and Video Large Multimodal Models