Freddy Maldonado

Stable IDs for noisy embeddings: Idemapi - From similarity to certainty

by
idemapi brings fuzzy extractor-style determinism to complex, non-binary embeddings. Bind once, derive later, and keep exact comparison logic on your side.

Add a comment

Replies

Best
Freddy Maldonado
Hi! I built this API because I kept running into the same frustrating problem: Modern systems increasingly rely on embeddings for identity-like tasks, but the operational layer on top of them is still messy. You get a vector from a face model, voice model, document encoder, profile encoder, or some other embedding pipeline... and then what? Usually the answer is: - compare with cosine similarity - choose a threshold - tune it by trial and error - keep adjusting it as data quality changes - hope the system stays understandable in production That works, but it’s brittle. Embeddings are noisy by nature. The same source rarely produces the exact same vector twice. Exact hashing breaks immediately, while similarity thresholds create a lot of complexity around false accepts, false rejects, calibration, and ongoing maintenance. This API is my attempt to offer a cleaner primitive: bind once, derive later, compare exact IDs. You send an embedding to Bind and get back two things: - an anchor - a deterministic ID Later, when you get a fresh embedding from the same source, you send it to Derive together with the stored anchor. If the source is the same, you recover the same ID. If not, you don’t. That means you can move from approximate similarity logic to exact application-level identity checks. What I like most about this is that it stays narrow and composable: - it does not replace your embedding model - it does not own your users or your database - it does not require a vector DB in the hot path - it does not require retaining raw embeddings on your backend - it gives you a stateless identity primitive you can plug into your own workflows I think this can be useful in a lot of places: - face verification - voice auth - privacy-preserving deduplication - profile or document linking - fraud controls - model evaluation pipelines - entity resolution systems A big caveat, and an important one: this depends heavily on the quality of your embedding model. This API does not magically fix weak embeddings. What it does is turn embedding consistency into something operationally much easier to use. If you’re working with embeddings in production, I’d genuinely love your feedback on: - whether this solves a real pain point for you - which use case feels most compelling - what would make this trustworthy enough to adopt in a serious system Thanks for checking it out!