gpt-realtime - For reliable, production-ready voice agents
gpt-realtime is OpenAI's new speech-to-speech model for production voice agents, delivering low latency and natural, expressive speech. The Realtime API is now GA, adding key features for developers like remote MCP support, image input, and SIP phone calling.



Replies
Flowtica Scribe
Hi everyone!
OpenAI's new gpt-realtime model is big step forward for voice agents. The key isn't just a faster model, but a shift in how it understands.
For a true voice agent to work, it needs to understand the subtle cues in our speech, the tone, the pauses, the emotion. That's what carries the real meaning. gpt-realtime is built on a voice-in, voice-out approach. It processes audio directly, without first transcribing it to text. This is the direction the field has been trying to break through.
Also great to see the Realtime API is now generally available, with practical new features for production like remote MCP server support and SIP integration.
YouMind
So cool! Now companion products can integrate with the Realtime API, which is a big step forward for improving user experience. I can't wait to try out real-time conversations! @OpenAI
DiffSense
Voice is definitely faster than typing. Is this the end of open-landscape offices?
Fakeradar
We just have to wait a littlebit more and we can communicate with the ChatGPT right while driving, without looking at the iphone screen...
Magiclight
This looks amazing — love how you’re empowering creators to scale AI experiences.
Triforce Todos
The real test will be, can it pick up hesitation, sarcasm, or subtle emphasis? That’s where most AI agents break down.
Pretty cool update
Vomyra AI – Voice AI Agent