Chris Messina

Clipto - Fully local, natural language search over terabytes of media

Like Google Photos, but fully local. Turn the terabytes of video, audio, meetings, and files you work with into searchable memories, without uploading anything to the cloud. Clipto automatically tags people, dialogue, and scenes, so you can instantly find any moment buried in your media just by describing what you're looking for. It's fast too: on a MacBook Pro M5, Clipto indexed 2TB of videos in just 24 hours.

Add a comment

Replies

Best
Yiyao Liu

Is it really as simple as dragging clips in and typing what you want? If so, this is going to save video editors hours of organizing every week!

Matthewwei

@new_user___1282025165cc92287e7a197 Yes, that was exactly the Aha moment for me when I first tried it. I was editing a short drama series at the time, and I immediately felt how much time it could save:)

Candice

As a creator signing strict NDAs for commercial projects, cloud tools are out of the question. Is there really no cloud rendering or uploading involved at all?

Matthewwei

@panwangqun Yes, Clipto is built around local-first processing. Your media analysis and search run on your device, so your footage doesn’t need to be uploaded to the cloud or rendered on our servers. For NDA-sensitive projects, you can even keep the workflow fully offline and use Clipto without an internet connection. Hope Clipto can help with that:)

Daniel

Congrats! @henry_kang

Henry Kang

@danielwayne Thanks Daniel!

Hanzhi Zhang

Is there a maximum file size or duration limit for a single clip? I frequently work with 3-hour long uncompressed theater recordings and want to make sure the local database won't crash during indexing

Matthewwei

@hanzhizhang0405 There isn’t a fixed file size or duration limit for a single clip.For long recordings like a 3-hour theater capture, Clipto can index them, but the processing time will depend on your machine and the length of the video.
One useful note: under the same device conditions, Clipto’s content understanding speed is usually more related to video duration than file size. File size can matter in some cases, like when transcoding is involved, but it’s usually not the main factor.

xWang

If I have duplicate files or very similar takes of the same scene, how does Clipto display them in the search results? Does it group them together?

Henry Kang

@zephyrlink_i Great question.

Exact duplicate files are automatically deduplicated during indexing, so we don’t process or store the same file multiple times.

For similar takes, alternate angles, or near-duplicate shots, we currently keep them as separate results and rank them based on relevance to the search query.

In practice, that’s often what creators want. When you’re editing, multiple takes of the same scene can have subtle differences in framing, timing, performance, or camera movement, so seeing several strong matches side-by-side helps you quickly compare and choose the best shot.

That said, grouping similar results is something we’re actively exploring, especially for large productions with hundreds of takes. We think there’s a balance between reducing clutter and preserving creative choice.

@henry_kang @matthewwei the fully local processing is the whole product, not a feature. Routing all inference on-device means the Premiere Pro integration isn't a nice-to-have, it's the trust argument for any team that can't let footage leave the building. Congrats on shipping.

Henry Kang

Thanks, Joseph.@kjosephabraham 

That’s exactly how we think about it. We never started with “let’s add a local mode.” We started with the belief that some of the most valuable media people own should never have to leave their machines in the first place.

For many creators, studios, and teams, local processing isn’t just about privacy. It’s about maintaining ownership, reducing dependency on cloud infrastructure, and being able to work wherever the media is.

Really appreciate the support.

We think the future of AI will be local-first much more often than people expect.

Zhengyang Hou

Is there a way to add manual tags or notes on top of the AI descriptions to customize the search for a specific client project?

Matthewwei

@zhengyang_hou Clipto uses auto-tagging by default.

The AI analyzes every media file you drag and drop — and automatically generates tags based on the content of your files. This makes it easy to search by tag right away.

Milad Avaz

Can this be setup with TrueNAS?

Matthewwei

@miladavaz Yes, absolutely!

macOS natively supports both SMB and NFS. You can mount your TrueNAS storage as a local directory via SMB or NFS, and then use it with Clipto.

The overall workflow is: TrueNAS for storage → Mount via SMB/NFS to Mac → Clipto indexes that path.

bing

Impressed by the 99+ languages for ASR. Does the natural language search also work across different languages? (e.g., searching in Spanish for a video with English dialogue)?

Matthewwei

@bing7 A very good question!

Our search function does indeed support cross-language retrieval - for instance, if you search in Spanish, you can find video content that contains English dialogues.

The current implementation logic is feasible and can help you locate a large number of relevant results. However, the technical complexity of cross-language semantic matching makes its accuracy unable to fully match the search experience within a single language. Some expressions that are more colloquial and have strong cultural attributes may have a certain impact on the matching effect.

We are also continuously iterating to make cross-language search increasingly intelligent. If you have specific search scenarios, we can help you assess the expected results.

Shake Lyu

How does it handle audio search for multiple speakers? If I search for a phrase, can I filter it by who said it, or does it just show the timestamp?

Matthewwei

@lvyanghuang Yeah, that's a great point ~

The search results will show both timestamps and speaker labels.

The system first automatically detects different voice characteristics to distinguish between speakers (for example, labeling them as Speaker A, Speaker B).

Next, you can rename these speakers, changing them to names or roles.

Most importantly, this editing operation is globally effective — you only need to modify it once, and it will automatically apply to the recognition results of that same speaker across all media files, with no need for repeated setup.

First
Previous
•••
345
•••
Next
Last