Hey everyone, I'm launching Dotient on Product Hunt this week and wanted to share what I've been building before the big day.
Dotient is a desktop app that lets you search your files by what they look like, not what you happened to name them two years ago. Everything runs completely offline, nothing leaves your machine. It also has a live graph view of your files, deep PDF search, and canvas workspaces for organizing everything.
I built it solo out of pure frustration with Windows File Explorer and the lack of any tool that actually understands your files without making you do all the work manually.
Would love to hear any questions or feedback before launch. Happy to talk through anything.
Dotient
Local-first + offline visual search is the part I actually care about - most semantic file search tools quietly ship your content to an API, so doing the embeddings on-device is the real differentiator here. Two implementation things: is the index updated incrementally via a file watcher as files change, or is it a manual re-scan, and where does the embedding DB actually live on disk? And for the deep PDF search, are you running OCR on scanned/image-only PDFs, or only indexing PDFs that already have a text layer?
Dotient
@hi_i_am_mimo Hey Valeria, thank's for the comment! To answer a few of your questions, there is a file watcher put in place for a certain set of basic directories like Downloads, Documents, Pictures and all folders within those directories. However if it is in other hard to find places, the file will be moved to a missing column in the SQLite DB. Users can easily just click on the file (which still does show the image, just a missing path), and update the path of where the file was moved to if it is one of these rare cases. The SQLite DB which holds almost all your data lives in 'C:\Users\[username]\AppData\Roaming\com.dotient.app' and whatever the macOS equivalent of that path is. The webp thumbnails and embedding models also live in that path. Regarding PDF search, we are indexing the text layer and also scanning for images. There is no OCR indexing at the moment, but that may be planned for the future. Obviously you could just highlight a region of the PDF that has text with the rectangle selection tool and probably get a similar search result if you tried searching for that text. The model does understand text to some extent so you are able to search for big words that are in images. Hope this was helpful!
Makes sense — defaulting the watcher to Downloads/Documents/Pictures covers most people, and the missing-path fallback is a clean way to handle moves. The thing I'd hit immediately is that my real files live in a ~/projects folder and an external drive outside those defaults: can I add custom watch roots, and when one watched file changes does it re-embed just that file incrementally or re-scan the whole folder? Curious too whether the embedding runs on a background thread so a big initial index doesn't peg CPU.
Dotient
@hi_i_am_mimo It is certainly in the roadmap to add custom watch roots after your concern. This would definitely be a smart feature to add especially since the app basically lives off knowing the paths of your files. When a watched file changes, it doesn't re-embed anything at all, it simply adjusts the path of the file, or if you changed the file name then it adjusts the name of the file. Does not have to re-scan or re-embed a whole folder. Embedding and search both run on Rust background threads so you shouldn't notice any UI lag when the models are being loaded. The only time you will notice UI lag during file processing is when your computer is computing file dimensions and dominant color metadata.
Local-first semantic search is the right call, the data leaving your machine is what kills these for real work. The thing I'd want as a user though: how do I trust it found everything? Keyword search fails loudly (zero results), but semantic search fails quietly, it returns something plausible and you never know what it missed. Do you surface a confidence or a "why this matched" so I can tell a real hit from a near-miss? That's what decides whether I rely on it or still fall back to ctrl-F.
Dotient
@david_marko You are indeed right to question that. Dotient runs a hybrid search system, BM25 + semantic in parallel, so anything keyword-findable won't silently fail. But you're right that the deeper "why this matched" question isn't fully solved yet. That's why we introduced this 'Signals System' within the lab page of the app. You can basically shift the entire embedding to what you believe is correct. Relevance is at least grounded in something you define. Proper confidence surfacing and match explanations are definitely on the roadmap, partly due to your concern. Appreciate this comment!
Local embeddings are the right call, but the part that bit me building this kind of thing is model versioning. The day you ship a better embedding model, every vector on disk is from the old one, so you either re-embed the whole drive, which is hours of background CPU, or run mixed old and new vectors where the query model and stored model disagree and recall quietly drops. How are you handling an embedding-model upgrade across an already-indexed machine, re-embed in place or version the index and migrate lazily?
Dotient
@dipankar_sarkar This does keep me up at night. Models are only getting better, and yes between different model versions or entirely different model makers, the dimensions that each embedding has to be is almost never consistent. So I'm not too concerned with upgrading the model at the moment since I have spent so long optimizing this one to it's absolute core. However, if it were to happen I would do some sort of lazy migration. Keep old index queryable, slowly re-embed all the files with the new model, run a dual-query between both models and merge the results layer until the old index fully drains. Thanks for the comment!
local + semantic search is a nice combo, feels like most semantic search tools assume you're fine sending everything to the cloud. how's the search quality holding up running fully local vs what you'd get from a cloud-based embedding model.
Dotient
@martin_mo Search quality is good when you are specific with your queries. One word queries usually aren't too strong but that's what my Signal system is for. Users can shift the embedding space to fit their queries, almost like a super advanced tagging system. The difference really isn't that far if you properly know how to use the software, and in some cases, the local model excels. Especially with super specific queries such as "Bird with yellow beak outside on bird feeder."
Hey, how are we handling privacy in this? Like, what data actually leaves my device or workspace when I run Dotient on something sensitive?
Dotient
@romejerome Great Question Jerome! The answer is absolutely nothing. Nothing leaves your device. Everything is stored in the SQLite DB that is on your device. There is no server besides the small API that simply checks if your license key is valid or not, that is the only time something ever communicates outside of your device.
I love your website! Is the expected usage to train a signal before every unique search? If i had a bunch of group photos and I'm looking to search by name, I'd have to create a signal for each person by clicking a bunch of pictures they're in and not in?
Dotient
@noice30sugar Correct, if you were to label a specific person with a name then it would probably be best to select a few images they are in, maybe lower the threshold a bit just to see all the options. Then if there are certain images that just won't go away and have nothing to do with the person, create a negative signal to eliminate those images. Once the signal is just right, you really shouldn't have to do any more adjusting. Any new image you import of that person, the signal will include it.