Jim Leighty, LICSW

Realistic audio with expanded emotional range

by

I'm trying to create realistic audio to support scenarios for frontline staff in homeless shelters and housing working with clients. The challenge is finding realistic voices that have a wide range of emotional affect. We are hoping to find a generative approach to developing multiple voices rather than creating voices with actors or ourselves. We've tried v3 Voice Design which expands on monotone generated voices but not much. We want voices that go from soft whispers to screaming and everything in between. Perhaps I'm not very good at prompting, but I've tried various attempts. Again, we're trying to do this without needing to record every voice which is not sustainable for our approach. Any recommendations? Thanks!

187 views

Add a comment

Replies

Best
Sanskar Yadav

Great question! Emotional range is still the most challenging part with AI voices. ElevenLabs is probably your best bet right now, but blending real recordings for extreme emotions with generated speech helps cover those gaps. Also, some creators use emotion tags and vary prompt styles, but the results are hit-or-miss. Curious if anyone’s cracked truly lifelike, expressive AI voices yet.

Jim Leighty, LICSW

@sanskarix Emotion tags help a little, but the range still seems restricted/muted.

Sourn Rockit

That’s such a meaningful use of technology, Jim. Realistic audio with emotional depth could really make training more impactful for staff in those challenging environments.

Abdul Rehman

How scalable is it? Could it handle dozens of different voice profiles for different roles?

Jim Leighty, LICSW

@abod_rehman We'll have dozen of personas representative of other the diversity of individuals experiencing homelessness or living in supportive housing.

Thibault (aka TBot, but still human 🤪)
hey Jim, your use. ase is really interesting and carry nice values. I may be able to introduce you to a few people that can help.
Jim Leighty, LICSW

@thibaulttbot I'd appreciate any assistance. Thanks.

Top9Trends AI

Hybrid Workflow (Recommended)

For your use case (training for frontline shelter staff), the best balance is a hybrid generative workflow:

Choose 3–4 base voices (e.g., two from ElevenLabs, two from Azure).

Script scenes with emotion tags — e.g. [calmly], [nervously], [yelling], [crying softly].

Generate speech variants and blend/sequence them to simulate real escalation (using crossfades or gain adjustments in an audio editor like Audacity, Descript, or Reaper).

Optionally, use whisper/scream sound design overlays (non-verbal breaths, gasps, sighs) from sound libraries like Boom Library or BBC Sound Effects for realism.

This hybrid approach gives you emotional realism without recording actors.

Jim Leighty, LICSW

@top9trends_ai Thanks. We're hoping to avoid overlays and extensive post-production work so that we can develop a workflow (likely hybrid) to quickly generate the voices. Future goal is to use these voices in a dynamic environment and we need to minimize lag. ElevenLabs and Azure seem to have a good selection of base voices. Any other sites you'd recommend?

Prithvi Damera

That’s such an important and compassionate use case, Jim — love that you’re focusing on realism and emotional nuance for training frontline workers. It’s true that most AI voice models still struggle with dynamic emotional range.

You might explore multi-agent setups or fine-tuned emotional conditioning — at Growstack, we’ve been experimenting with AI agents that adapt tone and emotion contextually, and the results are surprisingly human-like. Would be happy to exchange notes if you’re exploring similar generative pipelines