Botium Speech Processing

Botium Speech Processing

Open-Source Text-to-speech and Speech-to-text software stack

0 followers

Botium Speech Processing combines the best Open Source speech processing tools in a single service and makes them accessible with a HTTP/JSON API.
Botium Speech Processing gallery image
Botium Speech Processing gallery image
Launch Team
Auth0
Auth0
Start building with Auth0 for AI Agents, now generally available.
Promoted

What do you think? …

Florian Treml
This project is the result of a one year long learning process in speech recognition and speech synthesis. The original task was to automate the testing of a voice-enabled IVR system. While we started with real audio recordings, very soon it was clear that this approach is not feasible for a non-trivial app and it will be impossible to reach a satisfying test coverage. On the other hand, we had to find a way to transcribe the voice app response to text for doing our automated assertions. As cloud-based solutions where not an option (company policy), we very quickly got frustrated as there was no "get shit done" Open Source stack available for doing medium-quality text-to-speech and speech-to-text conversions. We learned how to train and use Kaldi, which is according to some benchmarks the best available system out there, but mainly targeting academic users and research. We made heavy-weight MaryTTS work to synthesize speech in reasonable quality. And finally, we packaged all of this in a DevOps-friendly HTTP/JSON API with a Swagger definition.
Florian Treml
Possible Applications - some examples what you can do with this: * Synthesize audio tracks for Youtube tutorials * Build voice-enabled chatbot services (for example, IVR systems) * Classification of audio file transcriptions * Automated Testing of Voice services with Botium
Florian Treml
Demo Swagger UI available at https://speech.botiumbox.com