Silero provides pre-trained enterprise-grade STT models. Enterprise-grade STT made refreshingly simple (seriously, see our benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.
Free, high-quality speech-to-text running fast on one CPU thread. Not middleware, no cloud required, no GAFA APIs. 10 Voices in 5 languages. Can be run locally on your PC / device / phone.
Stellar quality. Highly portable. No strings attached. Supports 8 kHz and 16 kHz. Models < one megabyte in size. Supports 30, 60 and 100 ms chunks. Trained on 100+ languages, generalizes well. One chunk ~ 1ms on a single thread. ONNX up to 2-3x faster.