Open-Source Text-to-speech and Speech-to-text software stack | Botium Speech Processing

Botium Speech Processing combines the best Open Source speech processing tools in a single service and makes them accessible with a HTTP/JSON API.

Lightfield — AI-native CRM that builds itself and does work for you

AI-native CRM that builds itself and does work for you

This project is the result of a one year long learning process in speech recognition and speech synthesis. The original task was to automate the testing of a voice-enabled IVR system. While we started with real audio recordings, very soon it was clear that this approach is not feasible for a non-trivial app and it will be impossible to reach a satisfying test coverage. On the other hand, we had to find a way to transcribe the voice app response to text for doing our automated assertions. As cloud-based solutions where not an option (company policy), we very quickly got frustrated as there was no "get shit done" Open Source stack available for doing medium-quality text-to-speech and speech-to-text conversions. We learned how to train and use Kaldi, which is according to some benchmarks the best available system out there, but mainly targeting academic users and research. We made heavy-weight MaryTTS work to synthesize speech in reasonable quality. And finally, we packaged all of this in a DevOps-friendly HTTP/JSON API with a Swagger definition.

Possible Applications - some examples what you can do with this: * Synthesize audio tracks for Youtube tutorials * Build voice-enabled chatbot services (for example, IVR systems) * Classification of audio file transcriptions * Automated Testing of Voice services with Botium

Demo Swagger UI available at https://speech.botiumbox.com

Lightfield — AI-native CRM that builds itself and does work for you

AI-native CRM that builds itself and does work for you

Demo Swagger UI available at https://speech.botiumbox.com