Inworld

#1 ranked TTS, speech-to-speech, and LLM routing

5.0•4 reviews•

976 followers

#1 ranked TTS, speech-to-speech, and LLM routing

5.0•4 reviews•

976 followers

Visit website

AI Infrastructure Tools

•

Realtime Voice AI

•

AI Voice Agent Infrastructure

Inworld builds the infrastructure for production voice AI. One platform with speech-to-text, an LLM router, and the top-ranked text-to-speech, all connected on a single API so context flows between every layer. Used by developers building voice agents, AI companions, and conversational apps.

This is the 5th launch from Inworld. View more

Realtime TTS-2

Launching today

Voice AI that feels as good as it sounds

Realtime TTS 1.5 is #1 on Artificial Analysis, voted best in blind tests by thousands of real users. TTS-2 builds on that with six major upgrades: natural language voice direction for tone, emotion, speed, and pitch. Text-based voice design, where you describe a voice in words and generate it. Cross-lingual synthesis across 100+ languages preserving speaker identity. IPA phonetic control for brand names and rare words. And improved alphanumeric pronunciation. Try it free at inworld.ai/tts.

Free Options

Launch tags:API•Developer Tools•Artificial Intelligence

Launch Team / Built With

getviktor.com — An AI coworker that actually does the work

An AI coworker that actually does the work

Promoted

Inworld

Maker

Hi Product Hunt! We're back! I'm Kylan, CEO and co-founder of @Inworld.

Some of you might remember when we launched Inworld TTS here. It went on to become the #1 ranked voice AI on Artificial Analysis, voted best in blind listening tests by thousands of real users. That meant a lot to us, so we went back and rebuilt the model from the ground up.

Today we're launching Realtime TTS 2.0. Try the live speech-to-speech experience at realtime.ai.

Here's the thing we kept hearing from builders: voice AI was built for audiobooks and voiceovers. It sounds good, but it sounds like a human reading from a script. If you've ever talked to a voice agent and thought "something feels off," that's why. Realtime conversation is a completely different problem, and we decided to solve it.

What can you build with it?

Companion apps that adapt to your user's mood and tone in real time through natural language voice direction
Language tutors that switch languages mid-session with the same voice, no re-recording
Characters that sound exactly how you describe them with text-based voice design
Support agents that get every code, name, and number right with improved alphanumeric handling and International Phonetic Alphabet (IPA) support

So what actually changed?

Natural conversationality. We trained the model on conversational speech instead of narration. You get natural rhythm, breath, micro-pauses, the cadence humans actually use when they talk to each other. Every voice you build on TTS 2.0 sounds like a person in conversation, not a narrator.

Conversational awareness. TTS 2.0 is informed by the full audio context of the multi-turn exchange. Not just the current sentence, the whole conversation. How it speaks adapts to how it was spoken to. A line delivered after a joke lands differently than the same line after bad news. The model knows the difference because it heard what came before.

Full voice direction. You steer the model with natural language the way you'd direct a voice actor. Not preset emotion tags, full descriptions: "act like you just got home from a long day, tired but warm." Combined with inline controls for specific moments ([whispering], [sigh], [excited]), the voice is as controllable as it is expressive.

Text-based voice design. Describe a voice in plain text, generate it. "A posh british man, aged 30-40, speaking deliberately" Iterate on the prompt until it fits, save it, deploy it. No casting calls, no recording booth.

Crosslingual fluency. One voice across 100+ languages with on-the-fly switching inside a single generation. Your voice identity is preserved across every language. No re-recording, no managing separate voices per locale.

Realtime TTS 1.5 is still #1 on the leaderboard. TTS 2.0 takes that quality and adds everything that was missing to uplevel realtime conversation.

Learn more at inworld.ai/tts. Happy to answer any questions in the comments.

– Kylan

Report

19h ago

DiffSense

It sounds too much like audio book narration. I guess it was trained on that input? Same thing that plagues every single elevenlabs voice. The only voice that sounds human out there is the alloy voice from open ai. and thats an old ai voice. its so strange. this field should be wide open. competative. whats going on ? what an I missing?

Report

4h ago

Inworld

Maker

@conduit_design Did you try Myles on realtime.ai? Curious what feels off there for you.

Report

3h ago

DiffSense

@kylan_gibbs I cant speak back to it. im at the library 😂 but the opening sounded audiobook ishh. Its jsut that humans do not speak like audio books. We speak much less teathrical in a way. do you know what I mean? I guess Ai voice will be unsolved for a while still, stuck in the audiobook period. until we escape that. I feel the Alloy voice released like 2-3 years ago escapes this. but it takes like 10-12 recordings to get it with the right flow. I wish it was possible to just say. End it with less punchyness. not so fast. take it easy on the first part. etc etc. Instead of editng and doing man iterations. or have an AI just solve that editing. then just write a text and say. make this sound like a product demo, smth that theverge would make etc. and an ai would take care of the iterating and editng etc.

Report

2h ago

Inworld

Maker

@conduit_design Give it a try when you can speak to it, the naturalness comes from how it can interpret you and your context as well, let me know once you give it a shot!

Report

2h ago

I'm most excited about the improvements made in cross-lingual. It's so seamless to have an engaging conversation and switch between multiple languages like English, Hindi, then French and it's the same voice.

Report

2h ago

Inworld

Maker

Hey everyone, Andreas from the Inworld team! I've been pumped about this launch for weeks and I'm so excited that we finally get share TTS-2 with you all. If you want to hear what it can do, jump into the playground at inworld.ai/tts and try voice design or steering for yourself or play with our realtime demo at realtime.ai. Would love to hear your reactions!

Report

12h ago

Inworld

Maker

Realtime TTS 2 is our best model yet.

It's designed to be a frontend of a voice interfaced application of any kind and scale.

Besides naturalness and multilingual quality improvements, in this iteration, this model can't be actually called a "yet another" TTS. Because similarly to speech-to-speech models, Realtime TTS 2.0 was trained to be explicitly steered to provide the most appropriate response, given the conversation context and agent's goal.

Check it out!

Report

2h ago

So I tried it, Speech to Speech. It confuses itself and hallucinates very quickly with just basic questions and conversation, I asked both bots how are you, what are you doing today, and what are you doing for dinner. Both gave me completely different spectrum of answers. They gave alot of filler responses like hey, hmm, huh, which I can understand why those are there. But Jason started telling me how to increase the gain of my television set, and Sarah thought I was going to a party. Also the vocal fidelity is alot to desire, in speech to speech. Just my honest feedback so far. Keep at it.

Report

2h ago

training on conversation instead of narration is the right call. every voice agent ive tried sounds like an audiobook reading my support ticket back.

congrats team !!

Report

4h ago

1 2

Previous Inworld Launches

Inworld RuntimeThe AI runtime for top consumer applications

Launched on August 13th, 2025

Inworld TTSVoice AI that’s 5% of the cost. 100% of the quality.

Launched on June 26th, 2025

Inworld ArcadeDo our AI characters pass the Turing test, you tell us

Launched on December 1st, 2022

Inworld AIBrains for virtual characters, powered by AI

Launched on August 23rd, 2022

Hi Product Hunt! We're back! I'm Kylan, CEO and co-founder of @Inworld.

Today we're launching Realtime TTS 2.0. Try the live speech-to-speech experience at realtime.ai.

What can you build with it?

Companion apps that adapt to your user's mood and tone in real time through natural language voice direction
Language tutors that switch languages mid-session with the same voice, no re-recording
Characters that sound exactly how you describe them with text-based voice design
Support agents that get every code, name, and number right with improved alphanumeric handling and International Phonetic Alphabet (IPA) support

So what actually changed?

Realtime TTS 1.5 is still #1 on the leaderboard. TTS 2.0 takes that quality and adds everything that was missing to uplevel realtime conversation.

Learn more at inworld.ai/tts. Happy to answer any questions in the comments.

– Kylan

Inworld

#1 ranked TTS, speech-to-speech, and LLM routing

#1 ranked TTS, speech-to-speech, and LLM routing

Realtime TTS-2

Previous Inworld Launches

Previous Inworld Launches

What's great

What's great

What's great

What's great

What's great

What's great