gpt-realtime is OpenAI's new speech-to-speech model for production voice agents, delivering low latency and natural, expressive speech. The Realtime API is now GA, adding key features for developers like remote MCP support, image input, and SIP phone calling.
On their livestream today, OpenAI just released a bunch of new tools for reliably building and using AI agents. From what I can tell, this is what's new-
New APIs:
Responses API - a new multi-modal API that builds on chat completions to allow for the next-generation of tool calling, starting with the new tools announced today.
gpt-oss-safeguard is a new family of open-source safety models (120b & 20b) from OpenAI. They use reasoning to classify content based on a custom, developer-provided policy at inference time, providing an explainable chain-of-thought for each decision.
I don't see a lot of products using the realtime api in building their conversation ai agents. Given that it now has realtime communication support through WebRTC allowing low latency conversations, I expected it to blow up. Are there any limitations of this model like hallucinations and or is it just too expensive for commercial use?
SWE-Lancer is an open-source benchmark from OpenAI, featuring 1,400+ real-world software engineering tasks sourced from Upwork. Test your AI's coding and managerial skills.