I’m Hyde, the solo dev behind IndexTTS-2 Online (indextts-2.com).

This project started from a very selfish need: I love playing with new TTS models, but I hate dealing with GPUs, CUDA errors, giant checkpoints and random “works on my machine” repos. I wanted something where I could just open a browser, paste some text, and get back a voice that actually sounds like a human — with emotion and proper timing — without touching a terminal.

So I wrapped the open-source IndexTTS-2 model into a small SaaS-style tool.

🎙 What it is

IndexTTS-2 Online is a browser-based studio for emotionally expressive, duration-controlled, zero-shot text-to-speech.

You can:

Type text and get natural, expressive speech (not the flat “robot podcast” sound).
Upload or record a short voice reference and have the model speak in that voice.
Control duration so the audio fits your video cuts, subtitles, or lip-sync window.
Use it for Chinese, English, Japanese and some cross-lingual cases.

Use cases I had in mind: YouTube / TikTok dubbing, quick voice tracks for indie games, early drafts for audiobooks & podcasts, or multilingual versions of the same script.

⚙️ Under the hood

On the backend I’m running IndexTTS-2 as an autoregressive model with:

A reference encoder for timbre & style (the “voice cloning” part).
Duration / alignment control so you can aim for specific lengths.
A simple API layer that the web app calls from the browser.

The frontend is a boring-but-solid stack (Next.js + Tailwind etc.), with a small queue system so the model doesn’t fall over when multiple people generate at once.

There’s a free tier to play with, and a Pro plan for higher limits + custom voice reference upload/record.

🙏 I’d love your feedback

I’m especially curious about:

Voice quality vs. latency – is it good enough for your use case?
Duration control – does it help in your real editing workflow?
Product direction – what would make this a “must-have” tool for you (API, batch jobs, plugins, etc.)?

If you try it and break it, or get funny/uncanny outputs, please share them — those are super helpful for improving the product.

Thanks for checking it out and supporting indie builders! 💛

I’m Hyde, the solo dev behind IndexTTS-2 Online (indextts-2.com).

So I wrapped the open-source IndexTTS-2 model into a small SaaS-style tool.

🎙 What it is

IndexTTS-2 Online is a browser-based studio for emotionally expressive, duration-controlled, zero-shot text-to-speech.

You can:

Type text and get natural, expressive speech (not the flat “robot podcast” sound).
Upload or record a short voice reference and have the model speak in that voice.
Control duration so the audio fits your video cuts, subtitles, or lip-sync window.
Use it for Chinese, English, Japanese and some cross-lingual cases.

Use cases I had in mind: YouTube / TikTok dubbing, quick voice tracks for indie games, early drafts for audiobooks & podcasts, or multilingual versions of the same script.

⚙️ Under the hood

On the backend I’m running IndexTTS-2 as an autoregressive model with:

A reference encoder for timbre & style (the “voice cloning” part).
Duration / alignment control so you can aim for specific lengths.
A simple API layer that the web app calls from the browser.

The frontend is a boring-but-solid stack (Next.js + Tailwind etc.), with a small queue system so the model doesn’t fall over when multiple people generate at once.

There’s a free tier to play with, and a Pro plan for higher limits + custom voice reference upload/record.

🙏 I’d love your feedback

I’m especially curious about:

Voice quality vs. latency – is it good enough for your use case?
Duration control – does it help in your real editing workflow?
Product direction – what would make this a “must-have” tool for you (API, batch jobs, plugins, etc.)?

If you try it and break it, or get funny/uncanny outputs, please share them — those are super helpful for improving the product.

Thanks for checking it out and supporting indie builders! 💛

IndexTTS2

Precise duration & emotional zero-shot tts

Precise duration & emotional zero-shot tts

🎙 What it is

⚙️ Under the hood

🙏 I’d love your feedback

🎙 What it is

⚙️ Under the hood

🙏 I’d love your feedback