MAI-Transcribe-1 is Microsoft’s new multilingual speech-to-text model built for real-world audio. It delivers best-in-class accuracy across 25 languages, strong robustness in noisy environments, faster batch transcription, and pricing aimed at production speech workflows.
MAI-Image-2 is Microsoft's new text-to-image model built with photographers, designers, and visual storytellers in mind. It pushes hard on photoreal lighting, reliable in-image text, and rich cinematic scenes for actual creative work.