Launched this week

Clipto
Fully local, natural language search over terabytes of media
805 followers
Fully local, natural language search over terabytes of media
805 followers
Like Google Photos, but fully local. Turn the terabytes of video, audio, meetings, and files you work with into searchable memories, without uploading anything to the cloud. Clipto automatically tags people, dialogue, and scenes, so you can instantly find any moment buried in your media just by describing what you're looking for. It's fast too: on a MacBook Pro M5, Clipto indexed 2TB of videos in just 24 hours.











ComputerX
So I can just dump all my messy folders into the app and let the AI do the heavy lifting? No manual tagging or renaming required at all?
Clipto
@bruceyongli Yes — just drag your local media files into Clipto.
From there, Clipto will watch and listen to the content for you. It can recognize basic media information like format, resolution, frame rate, and aspect ratio — for example MP4 or MOV files, 4K or 1080p footage, 24fps or 30fps, widescreen or vertical clips.
It can also understand what’s inside the content, including people, actions, dialogue, scenes, objects, and more. During indexing, Clipto automatically adds multi-dimensional tags to your media files, so in most cases, you don’t need to manually rename, tag, or organize everything first.
Does it support video transcription search?
Clipto
@nithin_raju1 Yes, it does. Clipto supports video transcription search. Beyond transcription, AI can also generate a summary for your video. You can click into any single file to open its detail page, where you’ll see the transcript, summary, and a chat box. From there, you can ask questions about the video, and the AI will answer based on the content of that specific file:)
Zoer.ai
As a filmmaker handling sensitive footage, I’ve been waiting for something exactly like this. No uploads = no leaks.
Clipto
@shgjj9 Filmmakers are actually one of the core user groups we had in mind when building Clipto.
And yes — because everything runs locally, your footage doesn’t need to be uploaded anywhere. That also means you can keep using Clipto in places with poor or no internet connection, like on a mountaintop set, on a plane, or anywhere in the field. Hope Clipto can help make your footage workflow a lot easier:)
InsForge
Wait, so I can search for 'man wearing red hat' across all my raw footage and it just... finds it? Locally? That’s wild.
Clipto
@jiaqichen Yes. Semantic search for visual content, locally, is exactly what we built~
This is super cool, I wanted to ask you a question. How does it deal with hardware and devices that are very weak?
Clipto
@sam_alghaithi Great question! Supporting a wide range of hardware has actually been one of our biggest engineering challenges.
We approach it from two directions.
First, at the model layer, we use different model tiers optimized for different classes of machines. Depending on the available hardware, Clipto can choose between smaller and larger models to balance quality, speed, and resource usage.
Second, we’ve spent a lot of time optimizing the orchestration layer. Different workloads are scheduled differently depending on the machine’s capabilities.
On high-end systems, we can take advantage of more parallelism and process media much faster. On lower-powered machines, the priority shifts toward stability and responsiveness, making sure indexing doesn’t overwhelm the computer or interfere with normal work.
There’s still room for improvement, but a lot of the engineering effort has gone into making local AI practical on real-world hardware rather than assuming everyone has a top-spec machine.
@henry_kang Congrats, this is truly an engineering marvel. I once dealt with issues like this, and it took me a long time since I was focused on mobile devices, which is an entirely different challenge. Very cool I’ll be sure to test it out!
Clipto
@sam_alghaithi Thanks! Please do let me know how it performs on your device. I am very curious to see.
@henry_kangIt performed really well, and with my powerful laptop, everything ran smoothly. My MacBook has 128GB of RAM, and it was great.
This looks like a great concept. I'd definitely love to give it a try, especially since I spend a lot of time switching between notes, recordings, and transcripts. One question though; when can we expect support for Pixel phones? That would make it an easy download for me.
Clipto
@vikranth_reddy_bollam Thanks! We’d love to have you try it as well.
Pixel support is definitely something we’re interested in, but today our focus is on Mac. The product is fairly compute-intensive, and we wanted to start on a platform where we could deliver the best possible experience before expanding further.
We’ve spent a lot of effort optimizing for Apple Silicon and making large-scale media indexing practical on a local machine.
Mobile devices are absolutely on our radar, but for now we’re focused on continuing to improve the Mac experience first and then evaluating the best path to bring Clipto to other platforms.
Out of curiosity, what’s your primary use case on a Pixel? Notes, recordings, podcasts, or something else?
Does the natural language search get better over time through local fine-tuning, or is the model static upon installation?
Clipto
@ea_z Great question.
To clarify, Clipto does not perform local model fine-tuning on your personal media library today. The underlying models themselves aren’t continuously retrained on-device.
What we do have is a local data flywheel. As you use Clipto, your interactions, edits, labels, and feedback help the system build a better understanding of your media and preferences over time.
For example, if you consistently organize content a certain way, rename detected people, or make specific editing decisions, those signals can be incorporated into future retrieval and understanding workflows.
So while the model weights remain unchanged, the system itself becomes increasingly personalized as it accumulates more context about your library and how you work with it.
We think that’s often more valuable than fine-tuning alone because it allows the experience to improve while keeping your personal media private and local.