trending

27d ago

Microsoft MAI-Voice-2 - Expressive TTS with voice cloning in 15 languages

Microsoft's most expressive TTS model yet — voice cloning from short samples, fine-grained emotional control, and consistent voice identity across 15 languages. Now live in Azure AI Foundry at $22 per million characters, with integrations rolling out in VSCode, Dynamics 365 Contact Center, and Teams. For builders shipping voice agents who need production-grade prosody without the OpenAI Realtime API price tag.

29d ago

MAI's 7 New Models - Reasoning, Code, Image, Voice & Transcription AI

At Microsoft AI, our vision is humanist superintelligence. That means building world-class models that are as safe as they are capable, made for the demands of real work, and designed not to outpace human potential, but to amplify it.

3mo ago

MAI-Transcribe-1 - Production ASR for noisy multilingual audio

MAI-Transcribe-1 is Microsoft’s new multilingual speech-to-text model built for real-world audio. It delivers best-in-class accuracy across 25 languages, strong robustness in noisy environments, faster batch transcription, and pricing aimed at production speech workflows.

3mo ago

MAI-Image-2 - Microsoft's top-tier text-to-image model for creatives

MAI-Image-2 is Microsoft's new text-to-image model built with photographers, designers, and visual storytellers in mind. It pushes hard on photoreal lighting, reliable in-image text, and rich cinematic scenes for actual creative work.