Chris Messina

Clipto - Fully local, natural language search over terabytes of media

Like Google Photos, but fully local. Turn the terabytes of video, audio, meetings, and files you work with into searchable memories, without uploading anything to the cloud. Clipto automatically tags people, dialogue, and scenes, so you can instantly find any moment buried in your media just by describing what you're looking for. It's fast too: on a MacBook Pro M5, Clipto indexed 2TB of videos in just 24 hours.

Add a comment

Replies

Best
Chris Messina

We've been honing Clipto's story for a few months. At the end of our last call @henry_kang proved the value of the product.

He and his team were out in the desert, testing Clipto remotely: minimal reception, terabytes of footage sitting on his laptop, and he needed to find a specific shot for the launch video.

He searched for: "the wide drone shot where the car enters the desert".

He didn't want "a cinematic moment." Not a "vibes" search.

He knew he had the clip but in the pre-Clipto world, it would take hours of video scrubbing to find it.

He found that clip in seconds using natural language to search over his own media, fully local.

Just like Google Photos — but nothing lives in the cloud.

This isn't an easy problem to solve. Henry's been pursuing this direction for over twenty years, when at CMU's Robotics Institute (my alma mater, FYI), he began pushing the limits of computer vision. He starting with indexing hundreds of images and then advanced to millions of objects — and watched recognition basically explode once memory scaled.

Clipto is in many respects the culmination of that work, pointed at your personal hard drive.

And it's quick: a modern M5 MacBook chews through ~2TB of video in about a day. Why not push yours through its paces?

Henry Kang

@chrismessina 

Thanks Chris. 🙏

One thing we’ve learned from today’s discussions is that people aren’t really looking for “AI magic.”

They already know the clip exists.

They already own the footage.

They just need a reliable way to find it.

Whether it’s:

• the exact moment a decision was made in a meeting
• a specific quote from a podcast recorded months ago
• a particular shot buried in terabytes of footage

the common problem is the same:

our computers store everything, but remember nothing.

That’s ultimately what we’re building toward: a local memory layer for the media people already own.

Stan Kolotinskiy

@henry_kang I'm really glad that you mentioned the fact that people are not looking for AI magic - that's indeed true. Great stuff!

Henry Kang

@sk_uxpin Thanks! 🙌

Henry Kang

Hi Product Hunt! I’m Henry, founder of Clipto.

Clipto gives you the ability to search in natural language over terabytes of media in seconds.

Think: Google Photos, but fully local.

During my 20 years ago at CMU’s Robotics Institute, I became obsessed with memory systems: what if computers could actually remember what they’ve seen?

I trained robots to memorize millions of product images crawled from the Amazon catalog (the standard back then was to index 100s of images at a time), and discovered that they could use that memory to recognize almost anything they encountered!

By pushing computers beyond their conventional limits, I had unlocked an explosion in machine intelligence.

Years later, the problem has become personal.

Our computers are full of valuable raw footage, interviews, recordings, and more, but most of that data is still painfully hard to search, revisit, or reuse. We are data-rich, but knowledge-poor.

That’s why I built Clipto. Clipto helps you find what matters inside terabytes of video, audio, meetings, and files, instantly, turning hours of repetitive work into seconds.

  • Find the wide drone shot where the cars enter frame.

  • Find the shot specifically in the moment the sandstorm arrives from hours of footage.

  • And find what you know is in there, without suffering through hours of scrubbing.

Clipto's memory system live where your data already is: on your device, under your control, available anytime, even offline — so you can keep working wherever and whenever inspiration strikes.

After two years of compressing, optimizing, distilling and orchestrating AI models to run entirely on-device, we are ready to share it with the Product Hunt community.

It’s still early, and it’s still compute-heavy. Right now, Clipto works best on higher-performance Apple Silicon Macs (M1 Pro/Max/Ultra and newer) with 24GB+ RAM. If you have a compatible Mac, we’d love for you to try it.

To celebrate our launch, we're offering 1 month free to anyone who signs up this week with code PHLNCH.

I’ll be here in the comments all day and would genuinely love to hear about the strategies you've developed to find your content diamonds in your digital rough.

Ansari Adin

'like Google Photos but fully local' framing is clean but Google Photos works because the index follows you across devices seamlessly. curious how Clipto handles the multi-device problem. if i index 2TB on my MacBook and then want to search from my iPad or a second machine, what does that look like. is the index portable or does each device need to reindex independently because that changes the use case significantly for anyone with more than one machine

Henry Kang

@ansari_adin That’s a very insightful question.

Today, Clipto works independently on each machine. The index is built locally and stays local, so if you index 2TB of media on your MacBook, that index currently doesn’t automatically appear on another device.

We made that tradeoff intentionally because our first priority was privacy, local ownership, and offline usability.

That said, we completely agree that long-term memory becomes much more valuable when it can follow you across devices. Cross-device memory and synchronization are already on our roadmap, and we’re actively exploring ways to do that while preserving the local-first principles that make Clipto unique.

In many ways, this is one of the most interesting problems for us: how do you build a Google Photos-like memory layer without giving up control of your data to the cloud?

Pradeep Malakar

This looks really interesting.

I'm curious about how deeply it understands media content.

Does it recognise things like camera angles, shot types (wide, medium, close-up), camera movements, transitions, B-roll, and multi-camera sequences?

It would be incredibly useful if I could search for something like "close-up shot of a person smiling" or "drone footage with a slow pan" and instantly find matching clips across my archive.

Would love to know how detailed the visual understanding gets beyond basic object and dialogue detection.

Matthewwei

@pradeepmalakar That's a really professional cinematography question. We're working hard to enrich our understanding of cinematic language to better serve professional video creators — here's what we can reliably recognize today:

  • Shot Type: Wide Shot, Medium Shot — e.g. "wide shot of a city street" or "medium shot interview"

  • Camera Angle: High Angle, Overhead/Top-down — e.g. "overhead shot of a table" or "high angle crowd scene"

  • Framing & Composition: Landscape — e.g. "landscape framing outdoor scene"

  • Scene & Setting: Urban/City, Green Screen/Studio, Day — e.g. "studio interview daytime" or "urban street scene"

  • Technical Specs: AV1, Rec.709, 4:2:0, 8-bit, 25FPS — e.g. filter footage by codec or color space when you need format consistency in an edit

  • Focus & Quality: Out of Focus — e.g. quickly filter out unusable takes

...and more, these are just a few examples across the many dimensions Clipto tags. Sorry I can't list them all here! Every case shown in our demo video is a real.

Camera movements, transitions, B-roll classification, and multi-camera sequences — those are on the roadmap and we're heads-down on it.

Would love to hear what specific search queries matter most to your workflow — it really helps us understand what to build next:)

Art Stavenka

Interesting. Local-first stops being a privacy story the second you can find a clip on your own drive faster than you'd find it in cloud storage. Question - what happens to the index when I rename or move a file in Finder after indexing? Does Clipto watch the filesystem?

Henry Kang

@artstavenka1 Great question.

Yes, Clipto watches the local filesystem and keeps the index in sync.

If you rename or move a file after it’s been indexed, Clipto will detect the change and update its references automatically, so the media doesn’t need to be re-indexed from scratch.

The heavy lifting (transcripts, visual understanding, embeddings, etc.) is already done, so we’re simply updating the file mapping rather than reprocessing the entire asset.

We designed it this way because media libraries are constantly evolving. People reorganize folders, rename projects, move files between drives, and we don’t want that to break search. Local-first only works if the index evolves with your library.

Joe Rucker

Cool concept. Real question though — is this doing actual frame-by-frame visual understanding or is it metadata/transcript/keyframe analysis? Because the gap between those two is enormous for practical use. What I actually want: upload 10 raw clips of the same scene, AI watches them all, ranks by emotional resonance, suggests best cuts.

Henry Kang

@joe_rucker Great question. It’s not just metadata, transcripts, or simple keyframe extraction.

We combine multiple signals, including metadata, speech transcripts, visual understanding, and information extracted directly from the video stream.

That said, we also don’t do naive frame-by-frame analysis across every frame. At scale, that becomes extremely expensive while often adding little value. Instead, we use a more selective approach to identify and analyze the most informative moments within a video.

Our current focus is helping users find the right moments and clips from large media libraries as quickly as possible.

The workflow you described, where AI reviews multiple takes, ranks them, and suggests the best cuts, is a fascinating direction. While Clipto doesn’t currently optimize for edit recommendations, many of the underlying building blocks are already there.

Out of curiosity, what’s your current editing workflow today? Are you using Premiere, Final Cut, Resolve, or something else? And how much of the selection process is still manual versus AI-assisted?

We’re spending a lot of time thinking about how AI agents and creative tools could work together, so I’d love to learn how you’re approaching it.

Joe Rucker

@henry_kang Thanks for the detailed breakdown - the selective approach makes sense at scale, naive frame-by-frame would be cost prohibitive fast. Current workflow is Descript for assembly with manual selection from Kling generations and Audio (as needed) from 11labs — probably 3-5 takes per scene, human judgment on which clip has the right emotional tone. The AI assists but the selection is still very much human. Launching something tomorrow myself in a completely different space, family memory and AI video. Would love any feedback when it's live.

Carlvert

Finally, a tool that respects our privacy. Since it's 100% local, does that mean absolutely zero data or telemetry is sent back to your servers?

Matthewwei

@carlvert That's exactly right for 100% on-device processing, no data leaves your machine, no telemetry, nothing. That's the whole point of that mode.

If you choose Hybrid mode, some minimal data is used to enable cloud features like sync and collaboration — but that's opt-in, and clearly labeled when you set it up. Your choice, fully in your control:)

Sabber Ahamed

@carlvert  @matthewwei How you track the performance of your product then? What if something went wrong or your users does not like it ?

Matthewwei

@carlvert  @sabber_ahamed We have multiple user feedback channels in place to ensure we receive and address user issues as quickly as possible.

When something goes wrong , we step in immediately under the premise of information confidentiality.

All troubleshooting and solutions are carried out only with the user's explicit authorization.

Once authorized, our technical engineers will investigate the issue and work to resolve it promptly.

We take all user needs and suggestions seriously — every piece of feedback is heard, recorded, and used to drive continuous improvement.

Sabber Ahamed

@carlvert  @matthewwei Since no data leaves users' devices, you don't receive any information about the system unless users report + authorize issues through the feedback channel, right ?

Jocky

Love the local-first philosophy! Does the single license cover multiple Macs, or do I need a separate seat for my studio desktop and my travel MacBook?

Henry Kang

@jocky Thanks! Today, most users run Clipto across their personal devices without friction.

We’re still refining some of the licensing and account management details as the product grows, especially for creators and teams who work across multiple machines.

Our goal is to make legitimate personal use feel simple, not burdensome.

Out of curiosity, when you switch between your studio desktop and travel MacBook, are you typically working with the same media library or different projects on each machine?

Jocky

@henry_kang Usually the same projects. Seamless access across devices is definitely important.

Henry Kang

@jocky That’s really helpful context.

What we’re hearing from more and more users is that search is only half the problem. Once people start building large media libraries, they want their memory and context to travel with them as well.

Today, each Clipto library is local to the machine, but seamless access across devices is definitely something we’re actively exploring.

When you’re switching devices, which is more important for you, accessing the same media assets, or are you expecting things like saved searches, people labels, and project context to carry over?

Sandy Liu

Been following the journey on LinkedIn and so glad to see you guys finally launch! The search functionality is spooky good.

Henry Kang

Thanks @sandy_liusy Sandy! 🙏

Search has definitely become the heart of the product. There’s something magical about finding a specific moment from terabytes of media in seconds.

To be honest, I’m still surprised by some of the details Clipto catches. Every now and then it finds something in a frame that I completely missed while filming. 😄

JaredL

This is genuinely impressive — local-first AI search for video is something I didn't know I needed until now. The desert story really sold it for me.

Quick question: does Clipto index audio content like podcast recordings or interview transcripts the same way it handles video footage? I have hundreds of hours of recorded interviews and this could be a total game-changer for my workflow.

Henry Kang

@jaredl Absolutely. Video gets most of the attention, but Clipto works with audio just as well.

Podcasts, interviews, meetings, voice recordings, and other audio files are all indexed and made searchable. You can search across transcripts using natural language and jump directly to the relevant moments.

In fact, if you’re sitting on hundreds of hours of recorded interviews, that’s one of the strongest use cases for Clipto. Those recordings often contain valuable insights that are almost impossible to rediscover later without a system like this.

We’d love to hear how you’re currently managing and searching those interviews today.

123
•••
Next
Last