Yannick Schmitt's profile on Product Hunt

Forums

•

2mo ago

EMA-Gated Temporal Sequence Compression in Vision Transformers - No fine-tuning required

Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder.

Result: 55.8x wall-clock speedup for ViTs on high-res video (1792p) with 97% fidelity. No fine-tuning required.

NeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy by tracking per-patch embedding-distance threshold via an Exponential Moving Average (EMA) of patch-level embeddings, effectively answering the architectural mismatch between O(N2) self-attention and highly redundant natural video streams.

Yannick Schmitt

About

Links

Badges

Forums

EMA-Gated Temporal Sequence Compression in Vision Transformers - No fine-tuning required