
Edit Mind
Search videos like you'd describe to a friend - 100% local
101 followers
Search videos like you'd describe to a friend - 100% local
101 followers
Built for content creators, video editors, journalists, and video production companies. Search hours of footage instantly. No cloud. No switching your editing software



Makerlapse
Hey,
I'm Ilias, a content creator and software developer. I've been creating content on YouTube for more than 3 years (80 videos, with an average duration of 1 hour). It's become harder for me to find video files and moments
After I got a bill from Google Video API for a couple of video analyses, about 450$ (couple of videos).
So I built a local-first tool that:
* Transcribes your footage on-device
* Analyzes frames: faces, objects, scenes, and text on screen
* Indexes everything so you can search in plain language ("find me when I'm at my desk looking excited")
Now, you can search using natural language to find the exact moment using local AI models. I don't wanna upload my videos to the cloud to get them indexed.
Start indexing your videos: https://edit-mind.com
@iliashaddad3 Hi Ilias, Congrats on teh launch, very cool tool. Do you centralize the index (1 query across your 80 vids) or is it individual?
Makerlapse
@zolani_matebese Thank you so much. It's indexing each video scene individually (each scene is about 1 second to 2 seconds, depending on your preferences or video duration). Then, it searches across all video scenes to give you the right moment. Let me know if this answer your question
@iliashaddad3 Yes Tx. Very nice
Makerlapse
@zolani_matebese Awesome, thank you so much
ZeroHuman.
Congrats on the launch @iliashaddad3 and @rohanrecommends !
Can the user select multiple found moments and send them as a batch to the editing timeline?
Makerlapse
Thank you so much@byalexai !
Yes, you can send multiple moments to the editing timeline, either using the search feature or the AI chat assistant.
Searching video like a conversation is a game-changer for editors. Does this index the visual elements of the frame, or is it primarily relying on the audio transcript?
Makerlapse
@rivra_dev Thank you. Yes, it indexes the visual elements of the frame, like faces recognized, objects detected, and on-screen text, and it describes the video frame scene plus transcription.
Love the concept as I have about 20TB of videos to search and sort through. Congrats on your launch. Besides adding to the integration with the various editing programs, what is on your upcoming roadmap for the foreseeable future?
Makerlapse
@st1100 Thank you so much! I'm currently working on the desktop app, making it optimized for Apple Silicon and Windows with GPUs. Improving the performance and speed of indexing videos, adding support for different video formats, and making it an essential part of the workflow as an editor or someone with a lot of videos.
Also, the project isn't only for video editors because I have a lot of feedback and requests from non-video editors, but many users have a lot of videos to index and search.
Also, I have a public roadmap for the self-hosted version: https://github.com/IliasHad/edit-mind/discussions/12
@iliashaddad3 Thank you for your prompt response and your link to the roadmap. I'm a Windows guy and am looking forward to the desktop app. Best of luck to you guys!
Makerlapse
@st1100 No worries. Thank you so much. Do you have a GPU in your Windows machine? If so, which model?
@iliashaddad3 I have an GEForce RTX 5070. Glad I got this before everything started going crazy....:o)
Hi Ilias, good luck. Which local models do you think work best with it?
Makerlapse
@sadegazoz Thank you! I've been using different models. Each model is good at a specific task: OpenAI Whisper for transcription, @YOLO for object detection, gpt-oss or qwen2.5:7b-instruct for the chat assistant, and RAG and DeepFace for facial recognition. I have a self-hosted version that is available on Github (https://github.com/iliashad/edit-mind)
description-as-search is the right framing for video, the failure mode is usually recall not precision. you describe "the bit where she laughs" and the system finds something close enough that you stop searching and miss the actual clip. how does edit mind surface "not sure, here are 3 candidates" vs just picking the top match? congratulations on your launch, good luck :)
minimalist phone: reduce your screentime
The whole picture looks like an intro from the series "Lie to Me" with Tim Roth. Detecting micro facial expressions etc. :)
Makerlapse
@busmark_w_nika Haha, love that reference. Thank you!