Clipto

Fully local, natural language search over terabytes of media

846 followers

Fully local, natural language search over terabytes of media

846 followers

Like Google Photos, but fully local. Turn the terabytes of video, audio, meetings, and files you work with into searchable memories, without uploading anything to the cloud. Clipto automatically tags people, dialogue, and scenes, so you can instantly find any moment buried in your media just by describing what you're looking for. It's fast too: on a MacBook Pro M5, Clipto indexed 2TB of videos in just 24 hours.

Free Options

Launch tags:Mac•Productivity•Artificial Intelligence

Launch Team

Blocks.aiThe Control Plane & Network Layer For AI Agents

Promoted

Osaurus

Hunter

We've been honing Clipto's story for a few months. At the end of our last call @henry_kang proved the value of the product.

He and his team were out in the desert, testing Clipto remotely: minimal reception, terabytes of footage sitting on his laptop, and he needed to find a specific shot for the launch video.

He searched for: "the wide drone shot where the car enters the desert".

He didn't want "a cinematic moment." Not a "vibes" search.

He knew he had the clip but in the pre-Clipto world, it would take hours of video scrubbing to find it.

He found that clip in seconds using natural language to search over his own media, fully local.

Just like Google Photos — but nothing lives in the cloud.

This isn't an easy problem to solve. Henry's been pursuing this direction for over twenty years, when at CMU's Robotics Institute (my alma mater, FYI), he began pushing the limits of computer vision. He starting with indexing hundreds of images and then advanced to millions of objects — and watched recognition basically explode once memory scaled.

Clipto is in many respects the culmination of that work, pointed at your personal hard drive.

And it's quick: a modern M5 MacBook chews through ~2TB of video in about a day. Why not push yours through its paces?

Report

2mo ago

Clipto

Maker

@chrismessina

Thanks Chris. 🙏

One thing we’ve learned from today’s discussions is that people aren’t really looking for “AI magic.”

They already know the clip exists.

They already own the footage.

They just need a reliable way to find it.

Whether it’s:

• the exact moment a decision was made in a meeting
• a specific quote from a podcast recorded months ago
• a particular shot buried in terabytes of footage

the common problem is the same:

our computers store everything, but remember nothing.

That’s ultimately what we’re building toward: a local memory layer for the media people already own.

Report

2mo ago

@henry_kang I'm really glad that you mentioned the fact that people are not looking for AI magic - that's indeed true. Great stuff!

Report

2mo ago

Clipto

Maker

@sk_uxpin Thanks! 🙌

Report

2mo ago

ViralSort

This looks really interesting.

I'm curious about how deeply it understands media content.

Does it recognise things like camera angles, shot types (wide, medium, close-up), camera movements, transitions, B-roll, and multi-camera sequences?

It would be incredibly useful if I could search for something like "close-up shot of a person smiling" or "drone footage with a slow pan" and instantly find matching clips across my archive.

Would love to know how detailed the visual understanding gets beyond basic object and dialogue detection.

Report

2mo ago

Clipto

Maker

@pradeepmalakar That's a really professional cinematography question. We're working hard to enrich our understanding of cinematic language to better serve professional video creators — here's what we can reliably recognize today:

Shot Type: Wide Shot, Medium Shot — e.g. "wide shot of a city street" or "medium shot interview"
Camera Angle: High Angle, Overhead/Top-down — e.g. "overhead shot of a table" or "high angle crowd scene"
Framing & Composition: Landscape — e.g. "landscape framing outdoor scene"
Scene & Setting: Urban/City, Green Screen/Studio, Day — e.g. "studio interview daytime" or "urban street scene"
Technical Specs: AV1, Rec.709, 4:2:0, 8-bit, 25FPS — e.g. filter footage by codec or color space when you need format consistency in an edit
Focus & Quality: Out of Focus — e.g. quickly filter out unusable takes

...and more, these are just a few examples across the many dimensions Clipto tags. Sorry I can't list them all here! Every case shown in our demo video is a real.

Camera movements, transitions, B-roll classification, and multi-camera sequences — those are on the roadmap and we're heads-down on it.

Would love to hear what specific search queries matter most to your workflow — it really helps us understand what to build next:)

Report

2mo ago

'like Google Photos but fully local' framing is clean but Google Photos works because the index follows you across devices seamlessly. curious how Clipto handles the multi-device problem. if i index 2TB on my MacBook and then want to search from my iPad or a second machine, what does that look like. is the index portable or does each device need to reindex independently because that changes the use case significantly for anyone with more than one machine

Report

2mo ago

Clipto

Maker

@ansari_adin That’s a very insightful question.

Today, Clipto works independently on each machine. The index is built locally and stays local, so if you index 2TB of media on your MacBook, that index currently doesn’t automatically appear on another device.

We made that tradeoff intentionally because our first priority was privacy, local ownership, and offline usability.

That said, we completely agree that long-term memory becomes much more valuable when it can follow you across devices. Cross-device memory and synchronization are already on our roadmap, and we’re actively exploring ways to do that while preserving the local-first principles that make Clipto unique.

In many ways, this is one of the most interesting problems for us: how do you build a Google Photos-like memory layer without giving up control of your data to the cloud?

Report

2mo ago

minimi

the 'store everything but remember nothing' line is the whole thing imo. the part people underrate is that the hard bit was never the search, its doing the indexing on-device without melting the laptop or quietly shipping stuff to a server, which is exactly why most tools just punt it to the cloud. respect for taking the harder path. one thing im curious about: once the first 2TB is indexed, is re-indexing incremental as you add footage, or does it re-chew the whole library? thats kind of the thing that decides whether this stays usable for anyone whose archive keeps growing

Report

2mo ago

Clipto

Maker

@dhanishta_likhar That’s a very insightful observation.

We actually agree with your premise. For local AI products, you have to solve the hardest problem first. If you can’t make large-scale on-device indexing practical, everything else is just a demo.

As for indexing, it’s incremental. Once your library has been processed, Clipto only analyzes newly added or changed files. It doesn’t re-chew the entire archive every time.

A lot of our engineering work has gone into task scheduling, indexing pipelines, and resource management to make sure growing libraries remain practical over time.

That’s ultimately the difference between a product that works for a 50GB library and one that can keep scaling as your archive grows year after year.

Report

2mo ago

Cool concept. Real question though — is this doing actual frame-by-frame visual understanding or is it metadata/transcript/keyframe analysis? Because the gap between those two is enormous for practical use. What I actually want: upload 10 raw clips of the same scene, AI watches them all, ranks by emotional resonance, suggests best cuts.

Report

2mo ago

Clipto

Maker

@joe_rucker Great question. It’s not just metadata, transcripts, or simple keyframe extraction.

We combine multiple signals, including metadata, speech transcripts, visual understanding, and information extracted directly from the video stream.

That said, we also don’t do naive frame-by-frame analysis across every frame. At scale, that becomes extremely expensive while often adding little value. Instead, we use a more selective approach to identify and analyze the most informative moments within a video.

Our current focus is helping users find the right moments and clips from large media libraries as quickly as possible.

The workflow you described, where AI reviews multiple takes, ranks them, and suggests the best cuts, is a fascinating direction. While Clipto doesn’t currently optimize for edit recommendations, many of the underlying building blocks are already there.

Out of curiosity, what’s your current editing workflow today? Are you using Premiere, Final Cut, Resolve, or something else? And how much of the selection process is still manual versus AI-assisted?

We’re spending a lot of time thinking about how AI agents and creative tools could work together, so I’d love to learn how you’re approaching it.

Report

2mo ago

@henry_kang Thanks for the detailed breakdown - the selective approach makes sense at scale, naive frame-by-frame would be cost prohibitive fast. Current workflow is Descript for assembly with manual selection from Kling generations and Audio (as needed) from 11labs — probably 3-5 takes per scene, human judgment on which clip has the right emotional tone. The AI assists but the selection is still very much human. Launching something tomorrow myself in a completely different space, family memory and AI video. Would love any feedback when it's live.

Report

1mo ago

Clipto

Maker

@joe_rucker Awesome! That sounds like a workflow we would use ourselves! Looking forward to your launch, and best of luck!

Report

1mo ago

@henry_kang Thanks Henry - really appreciate it. The AI-native editing workflow is genuinely different from traditional tools and I think it's underexplored, and would appreciate the feedback https://www.producthunt.com/products/dreams-of-yesterday

Report

1mo ago

One question on the indexing- Most of the examples here are professional, but does this work for a personal or family media archive too? My real-world mess: tens of thousands of files spread across folders, iPhone and Canon EOS naming mixed together, some GoPro footage, some with location metadata and plenty without. Once it's all sitting on a drive or OneDrive, Apple's native location mapping doesn't help anymore, so all I'm left with is filenames and maybe a creation date. Does Clipto index that kind of unstructured personal pile and make it searchable by what's actually in the footage, regardless of filename or missing metadata?

Report

1mo ago

Clipto

Maker

@michael_zorez Yes, absolutely.

While many of our examples come from professional media workflows, this is actually a problem we think about a lot.

In many ways, a family archive is even harder than a professional one. You have photos and videos scattered across phones, cameras, external drives, cloud folders, and years of inconsistent naming conventions.

Clipto is designed to index the content itself, not just filenames or metadata. So even when metadata is missing or incomplete, it can still use visual content, people, dialogue, scenes, objects, and other signals to make the library searchable.

That’s why we’re particularly interested in the “messy archive” use case. Most people don’t have a well-organized media library. They have exactly what you described: tens of thousands of files accumulated over years.

When you’re trying to find something in that archive, what are the searches you most wish you could do today but can’t?

Report

1mo ago

@henry_kang Exactly the "how would I use this" question I thought about when reading about Clipto.

Last year I tried to put together a photobook for my dad's 83rd. We live abroad so he doesn't get to see the kids much, and I wanted to give him something physical showing them growing up across different settings, school, trips, everyday moments.

The search I wished I had then was something like "one good photo of each of my girls, per year, across these settings."
What I actually did was scroll through years of camera rolls by hand, copy/pasting the useful ones into a shortlist folder, which I then went through a second time to select/crop the ones for the photobook. There was simply nothing I could ask to "show me the kids at this school event" or "find me the photos from trip xyz".

If Clipto can do "find photos of [person] at [kind of moment] over time" on an archive with no consistent file names, that's the feature that would have saved me a whole weekend. Literally.

Report

1mo ago

Clipto

Maker

@michael_zorez This is one of the most compelling use cases I’ve heard so far.

What you’re describing isn’t really a search problem. It’s a memory problem.

The challenge wasn’t that the photos didn’t exist. It was that the context, relationships, and stories connecting them were buried across years of camera rolls and folders.

The query you wanted — “one good photo of each of my daughters, across different stages of their lives and different kinds of moments” — is exactly the kind of experience we think becomes possible when media is organized around people and memories instead of filenames and folders.

Thank you for sharing this story. It’s a great reminder that the most valuable archives are often personal ones!

Report

1mo ago

@henry_kang happy to share if it can be of help in making tools like Clipto more targeted. Can‘t wait to try it 👍

Report

1mo ago

Local-only across audio + video + files is the version of this I keep waiting for, congrats on shipping. The piece that usually breaks under real load is the indexing job, not the search itself. How are you handling the initial pass on someone with 5 years of meeting recordings? And does the index update incrementally or do new files queue behind the original backfill?

Report

2mo ago

Clipto

Maker

@fabriziowexare Great question. We learned pretty early that indexing is actually the harder problem than search itself.

For large backfills (for example, years of meeting recordings), we’ve spent a lot of effort on scheduling, prioritization, and resource management.

The index is incremental. New files don’t have to wait for the entire historical backlog to finish processing.

If you’re indexing five years of recordings and a new meeting arrives today, Clipto will use a priority-based scheduling system to process the new content much sooner, rather than forcing it to sit behind a massive batch job.

Under the hood, we continuously balance long-running indexing tasks with newly arrived media so the system remains responsive while the library keeps growing.

Out of curiosity, how large is the media library you’re managing today? We’re seeing some users push well beyond the “normal” use case, which has been fascinating to learn from.

Report

2mo ago

1 2 3

•••

Reviews

We've been honing Clipto's story for a few months. At the end of our last call @henry_kang proved the value of the product.

He and his team were out in the desert, testing Clipto remotely: minimal reception, terabytes of footage sitting on his laptop, and he needed to find a specific shot for the launch video.

He searched for: "the wide drone shot where the car enters the desert".

He didn't want "a cinematic moment." Not a "vibes" search.

He knew he had the clip but in the pre-Clipto world, it would take hours of video scrubbing to find it.

He found that clip in seconds using natural language to search over his own media, fully local.

Just like Google Photos — but nothing lives in the cloud.

Clipto is in many respects the culmination of that work, pointed at your personal hard drive.

And it's quick: a modern M5 MacBook chews through ~2TB of video in about a day. Why not push yours through its paces?