
Cloudglue
Let your AI understand videos and audio
157 followers
Let your AI understand videos and audio
157 followers
Cloudglue APIs are the easiest way to transform video & audio into LLM-ready data. Build AI agents that can finally see and hear, and complete your organization knowledge base. Lightning-fast and developer-first APIs with cutting-edge video understanding.








Cloudglue
Hey Product Hunt! 👋 This is Amy, Kevin, and Matt. We are thrilled to launch Cloudglue today to help unlock the world's video data!
Why we built it
Videos contain some of an organization's most rich and up-to-date knowledge: from sales meetings, to product demos, and session recordings.
But video knowledge remains largely untapped. Today, incorporating video insights into AI involves wrestling with vision models, ML pipelines, and solving search challenges, while balancing costs and development efforts.
We were driven to solve this problem.
Our journey
Over the last 6 months, we've focused intensively on advancing video understanding through cutting-edge research, and drawing on our experiences building infrastructure, search, and video ML at AWS and Snap to create the APIs we wished existed.
(Our amazing design partners have also been instrumental in shaping what we're launching today! 💫)
💫 Introducing Cloudglue
Simple, lightning-fast, developer-friendly APIs with cutting-edge video understanding at its core - Cloudglue makes it effortless to add video insights to your AI.
We've made it as easy to tap into video knowledge as text documents, enabling AI agents to see and hear as well as humans can.
How it works:
For Developers: Extract structured data, generate multimodal transcripts, or chat with your entire video collection, with simple APIs.
For Everyone: Our MCP server lets you interact with your video collections easily. In minutes, you can ask Claude to "analyze last week's sales meetings and prepare a report".
Yes, Cloudglue turns any LLM multimodal!
We'd love your feedback!
This is our first public release, and we're eager to learn about your use cases and hear your feedback.
What have you tried building with video? What was difficult?
Which videos in your organization remain untapped?
What kind of insights are you looking to unlock across your video collections?
What's missing and what breaks?
We're listening and fixing fast! We'd also love you to try our MCP integration - it takes 5 minutes to set up with Claude Desktop, and it feels like magic. We'd love to hear your thoughts.
We're excited to usher in an era where AI agents can truly see and hear. We can't wait to see what you'll create with Cloudglue!
- Amy, Kevin, Matt
------
Cloudglue is free for everyone to try! https://cloudglue.dev
Follow us on X for the latest updates: https://x.com/cloudgluedev
Join our Discord community and share what you're building: https://discord.gg/QD5KWFVner
Quick Mock - AI interviewer for Any Job
Cloudglue
@kevinstyle Woohoo excited to get your feedback Kevin!
@amyxst Congrats on the launch! 🎉 The journey to get here must have been quite the ride. We're actually collecting founder stories like yours at FounderJourneys - would love to feature how you went from idea to launch day and beyond if you're interested!
Cloudglue is an exciting breakthrough! The idea of turning video and audio content into structured, LLM-ready data will truly unlock a wealth of knowledge that’s often hidden in those files. Can it also generate contextual analysis or summaries from video content, not just transcripts?
Cloudglue
@evgenii_zaitsev1 thanks!
We generate contextual analysis and summaries from more than just transcripts! You can try out our playground to see what it can do, and I'm also happy to tell you more!
Cloudglue
@evgenii_zaitsev1 Hi Evgenii - thank you for dropping by!
Absolutely! Cloudglue is fully multimodal and offers three ways to get contextual analysis:
Rich transcriptions - when generating a rich transcript, toggle on the `enable_summary` flag get a video-level summary (https://docs.cloudglue.dev/api-reference/endpoint/transcribe/post#body-enable-summary). It'll be derived from all the multimodal insights from the video - if you have all modalities switched on!
Extract - the entities you request referenced from all the multimodal signals from the video
Chat completions - you can flexibly Q&A on your video to generate any kind of report. I'd highly recommend trying our MCP server for this one!
Opkit
11x engineering manager here. We use Cloudglue to add video support to our knowledge base, so Alice can "learn" from past sales calls. It's been very solid so far!
Cloudglue
@shcallaway Love to hear it! 🙌
Cloudglue
@shcallaway It's been great working with 11x to tackle the unique challenges of sales recordings. Thanks for trusting us to enable video knowledge for Alice!
Cloudglue
Also check out our MCP server walkthrough! (Since we could only include one video above!)
No-code, video copilot - we'd love for you to try it out!
Setup: https://docs.cloudglue.dev/getting-started/mcp-server
Cloudglue
@amyxst imagine all the possibilities!
Graphite
Congrats @amyxst & team! Every large company I talk to is working to connect and centralize their knowledge bases and context for LLMs. So much of that context lives in videos from customer calls, demos, internal meetings, and more, and Cloudglue unlocks all of it! What have been the most interesting or surprising use cases you've seen so far?
Cloudglue
@merrill_lutsky 100% This was a large part of what drove us to build Cloudglue - video has to be understood.
There's been so many interesting ones!
Real-time coaching for sales calls
Capturing screen recordings for debugging - like having a live pair programmer
Social listening on the "latest in AI" from Youtube
We are so excited to see what folks will create. We'll be putting out more interesting use cases as demos in the coming weeks!
Hey Cloudglue team, congrats on the launch! The playground and dev integration are impressively smooth.
Quick question on the /transcribe endpoint: for YouTube videos with enable_summary=true, the summary returned seems to be the original YouTube description. Is this expected (perhaps due to the 'speech level understanding only' note for YouTube sources), or should a new summary be generated from the speech content? Thanks!
Cloudglue
If you have any questions or want to chat with us, feel free to drop into our discord!
https://discord.gg/QD5KWFVner