Vogent Voicelab is a platform for optimized inference of top open-source voice models, like Sesame's CSM-1B, Dia, Chatterbox, and more. Voicelab optimizes and post-trains these models to generate consistently high-quality speech ultra-fast.
We’re excited to launch Vogent Voicelab (vogent.ai/voicelab): an optimized API to run top open-source voice models.
New open-source text-to-speech models come out every week, with many ranking as state-of-the-art on popular benchmarks.
However, most of these models are not readily usable for high-volume, low-latency inference. Additionally, some research preview models can struggle with hallucinations and inconsistent outputs. Finally, as with any model, hosting yourself and managing compute can be a headache.
Voicelab solves these problems:
Voicelab maintains a proprietary inference stack that is optimized to serve text-to-speech transformers efficiently and scalably.
Voicelab post-trains select models to improve consistency and offer high-quality professional voice clones.
Voicelab manages all compute, so you can pay for these models per-character instead of managing GPUs.
All of this is exposed through a standard text-to-speech API (with streaming/websocket support) and an online playground.
One of my biggest challenges with AI voices is that sometimes it can feel a bit "robotic". For example, some of the "um" and "uh" filler words almost feel too intentional and not accidental. What are the team's advice to making the tone feel more natural?
@lienchueh The CSM-1B model on voicelab adds filler words wherever it's most natural itself, so you don't have to do so within the language that you're passing it. This should hopefully help!
If you're not a fan of the filler words, you can try decreasing temperature and you should get fewer. Let me know what you think!
Report
Just stumbled on Vogent and honestly? Kinda excited to see what these voice models can do. The page looks promising!
We ran into the exact issue of halluncinations/inconsistency when trying to autogenerate voice over our short form video content.... will give this a spin. Congrats!
Vogent Voicelab
We’re excited to launch Vogent Voicelab (vogent.ai/voicelab): an optimized API to run top open-source voice models.
New open-source text-to-speech models come out every week, with many ranking as state-of-the-art on popular benchmarks.
However, most of these models are not readily usable for high-volume, low-latency inference. Additionally, some research preview models can struggle with hallucinations and inconsistent outputs. Finally, as with any model, hosting yourself and managing compute can be a headache.
Voicelab solves these problems:
Voicelab maintains a proprietary inference stack that is optimized to serve text-to-speech transformers efficiently and scalably.
Voicelab post-trains select models to improve consistency and offer high-quality professional voice clones.
Voicelab manages all compute, so you can pay for these models per-character instead of managing GPUs.
All of this is exposed through a standard text-to-speech API (with streaming/websocket support) and an online playground.
Docs: docs.vogent.ai/voicelab
TTS Playground: app.vogent.ai
DiffSense
Would it be possible to make more "happy go lucky" 😬 voices? kind of like theverge review i.e:
Vogent Voicelab
@sentry_co we have a new voice model coming out soon that's better for narration/"emotive" voiceovers. Check us out again in 1-2 weeks!
Trufflow
One of my biggest challenges with AI voices is that sometimes it can feel a bit "robotic". For example, some of the "um" and "uh" filler words almost feel too intentional and not accidental. What are the team's advice to making the tone feel more natural?
Vogent Voicelab
@lienchueh The CSM-1B model on voicelab adds filler words wherever it's most natural itself, so you don't have to do so within the language that you're passing it. This should hopefully help!
If you're not a fan of the filler words, you can try decreasing temperature and you should get fewer. Let me know what you think!
Just stumbled on Vogent and honestly? Kinda excited to see what these voice models can do. The page looks promising!
SECONDSENSE
We ran into the exact issue of halluncinations/inconsistency when trying to autogenerate voice over our short form video content.... will give this a spin. Congrats!
Vogent Voicelab
@chrislucas Thank you!
Fast, optimized, and crystal-clear speech from top models! Vogent Voicelab nails it. Love it, it deserves a vote!
Metlo API Security
Wow this is so cool. How do you guys compare against Eleven Labs and Cartesia?