KugelAudio

Real-time text-to-speech model you can self-host

109 followers

Real-time text-to-speech model you can self-host

109 followers

Visit website

Text-to-Speech Software

•

Realtime Voice AI

Most natural real-time TTS with voice cloning and sub-60ms latency, on-prem or via API. Grammar-aware normalization reads phone numbers, IBANs, addresses, and medications naturally across 25+ languages, with word-level timestamps and IPA support. Adapters for LiveKit, Pipecat, and Vapi. Built by 4 in Berlin.

Free Options

Launch tags:API•Developer Tools•Artificial Intelligence

Launch Team

Blocks.aiThe Control Plane & Network Layer For AI Agents

Promoted

KugelAudio

Maker

📌

Hi PH 👋 Kajo from KugelAudio here. We're 4 people in Berlin, currently in SF for YC. Building for the future, which we believe will be conversational. Today we're shipping a real-time TTS model with voice cloning. If you only have a minute, here's what I'd want you to know. You can clone a voice from 30 to 60 seconds of audio. Drop in a short sample and you get a working voice immediately. We optimized for voice agents with latencies below 60ms (excl. network), input streaming and output streaming. We offer on premise support, run the model in your own cluster instead of calling our API. Useful when data needs to stay on your network. Adapters for LiveKit, Pipecat, and Vapi. SDKs in Python, JS, and Java. Free tier so you don't have to talk to us before trying it. Alex (our founding engineer) is in the thread today for the hard questions: model architecture, where the latency came from, what we broke before this version worked. Ask anything!

Report

2mo ago

Mailwarm

How does it handle long numbers and addresses in mixed language contexts, like German with English product names?

Report

2mo ago

KugelAudio

Maker

@thamibenjelloun Just tried it out and it works:)

Report

1mo ago

KugelAudio

Maker

@thamibenjelloun It will understand the difference and if your product name has a special pronunciation. Additionally, we have the dictionary features, where you can insert the exact pronunciation of thing.

Report

1mo ago

Congratulations on the launch guys! Few questions:
1. Websites list languages like Hindi to be supported but can not find any voice related to that on playground.

2. I checked the websocket streaming API for TTS.. is it possible to have multi context support (just like elevenlabs) in the streaming API? is that part of plan in future?

Cheers!

Report

2mo ago

KugelAudio

Maker

@ashishkingdom Currently we don't have stable support for Hindi, but feel free to try it with a voice sample for cloning. Our main focus are currently European languages and we gather many different voices and accents right now. Regarding your second question, we are compatible with elevenlabs sdk and offer a multi context endpoint that is currently used in the Livekit integration.

Report

2mo ago

@alexnetz super!! will check that out

Report

2mo ago

Dont this support nepali language?

Report

2mo ago

KugelAudio

Maker

@manish_regmi1Hey, not yet.

Report

2mo ago

Pushary

Sensational for German TTS

Congrats on the launch guys!

Report

2mo ago

Reviews