Vily's profile on Product Hunt

All activity

VilyhuntedStream-Omni: GPT-4o-like Chatbot

9mo ago

Stream-Omni is an GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across any modality combinations.

Stream-Omni: GPT-4o-like ChatbotStream-Omni is an end-to-end language-vision-speech chatbot.

Vilyleft a comment

1yr ago

LLaVA-Mini is a unified large multimodal model that can support the understanding of images, high-resolution images, and videos in an efficient manner. Guided by the interpretability within LMM, LLaVA-Mini only requires 1 token to represent each image, which improves the efficiency of image and video understanding, including computational effort (77% FLOPs reduction), response latency (reduce...

LLaVA-MiniLLaVA-Mini:Efficient Image and Video Large Multimodal Models

VilyhuntedLLaVA-Mini

1yr ago

LLaVA-Mini👏is an efficient LMM for image/video understanding using 1 vision token, offering: (1)⏩fast response (40ms per image) (2)🖥️less VRAM usage (support 3-hour video understanding on 24GB GPU).

LLaVA-MiniLLaVA-Mini:Efficient Image and Video Large Multimodal Models