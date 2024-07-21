OpenAI

gpt-realtime

gpt-realtime

Launching today
For reliable, production-ready voice agents
gpt-realtime is OpenAI's new speech-to-speech model for production voice agents, delivering low latency and natural, expressive speech. The Realtime API is now GA, adding key features for developers like remote MCP support, image input, and SIP phone calling.
Hi everyone!

OpenAI's new gpt-realtime model is big step forward for voice agents. The key isn't just a faster model, but a shift in how it understands.

For a true voice agent to work, it needs to understand the subtle cues in our speech, the tone, the pauses, the emotion. That's what carries the real meaning. gpt-realtime is built on a voice-in, voice-out approach. It processes audio directly, without first transcribing it to text. This is the direction the field has been trying to break through.

Also great to see the Realtime API is now generally available, with practical new features for production like remote MCP server support and SIP integration.

Nicole Astor

So cool! Now companion products can integrate with the Realtime API, which is a big step forward for improving user experience. I can't wait to try out real-time conversations! @OpenAI