OpenAI Launches New Real-Time Voice Models for Translation, Live Conversations

OpenAI has launched three new voice models for developers, expanding its real-time audio tools that can speak, translate and transcribe conversations as they happen.

The company said the new tools are designed to make voice-based apps more useful in everyday situations, especially where users need software to respond naturally while carrying out tasks in real time.

At the centre of the launch is GPT-Realtime-2, a voice model OpenAI says can handle more difficult requests while keeping conversations flowing naturally.

Unlike earlier versions, the company said the model uses GPT-5-level reasoning to manage interruptions, understand context better and carry out actions during conversations.

OpenAI also unveiled GPT-Realtime-Translate, a live translation model that can translate speech from more than 70 input languages into 13 output languages.

According to the company, the system keeps pace with the speaker during conversations instead of translating after pauses or completed sentences.

The third model, GPT-Realtime-Whisper, focuses on live speech transcription. It converts spoken words into text instantly while a person is talking.

“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said.

Voice products have become a huge focus for technology companies as more users interact with software through speech instead of typing. OpenAI said developers want systems that can manage tasks while conversations continue naturally.

The company pointed to customer support, travel, education, media and creator platforms as some of the areas expected to benefit from the new models.

OpenAI also described three growing patterns it sees in voice-based software.

The first is “voice-to-action”, where users speak naturally and the system completes tasks on their behalf. OpenAI said property platform Zillow is building an assistant that can help users search for homes, avoid certain neighbourhood conditions and book tours through voice requests.

Another pattern is “systems-to-voice”, where software provides spoken updates automatically. OpenAI gave the example of travel apps that could alert passengers about delayed flights, new boarding gates or transfer routes without users typing commands.

The third area is “voice-to-voice”, which focuses on live multilingual conversations. OpenAI said Deutsche Telekom is developing customer support systems that translate discussions instantly while both sides continue speaking in their preferred languages.

Travel company Priceline is also working on voice-based trip management tools, according to OpenAI. Travellers could eventually book flights, change hotel reservations and receive airport updates entirely through conversation.

Alongside the broader rollout, OpenAI added several new features to GPT-Realtime-2 aimed at improving live interactions.

Developers can now enable short phrases such as “let me check that” or “one moment while I look into it” before the system completes a request. OpenAI said this gives users clearer feedback while the model processes tasks in the background.

The model can also call multiple tools at once and explain those actions aloud during conversations. OpenAI said the system may say things like “checking your calendar” or “looking that up now” while working through requests.

The company added that GPT-Realtime-2 recovers better from errors or failed requests instead of stopping conversations abruptly. It also supports a larger context window, increasing from 32K to 128K, allowing longer and more detailed conversations.

OpenAI further noted that the model has improved understanding of specialised terms, including healthcare vocabulary and proper nouns. Developers can also adjust how much reasoning power the model uses depending on the complexity of a request.

According to benchmark figures released by the company, GPT-Realtime-2 achieved higher scores than GPT-Realtime-1.5 on audio intelligence and instruction-following tests.

All three models are available through OpenAI’s Realtime API. The company said GPT-Realtime-Translate and GPT-Realtime-Whisper will be billed by the minute, while GPT-Realtime-2 pricing depends on token usage.

OpenAI said it has added safeguards to reduce misuse, including protections against spam, fraud and harmful content. The company also added that conversations can be stopped automatically if they break its safety rules.

0Shares

OpenAI Launches New Real-Time Voice Models for Translation, Live Conversations

…designed to help apps listen, reason and respond more naturally during live interactions

NDPC Signs MoUs with BPP, NGF to Strengthen Data Protection and Privacy

Most Reliable Chatbots for Everyday Jobs

Joan Aimuengheuwa

Related Posts

Africa’s Enterprise Infrastructure Moving Toward Hybrid Multicloud Model – Nutanix Executive

Mira Murati Says Sam Altman ‘Created Chaos’ at OpenAI During Leadership Crisis

AI Power Surge Forces Microsoft to Reconsider 2030 Clean Energy Goal

Most Reliable Chatbots for Everyday Jobs

Leave a Reply Cancel reply

OpenAI Launches New Real-Time Voice Models for Translation, Live Conversations

…designed to help apps listen, reason and respond more naturally during live interactions

Subscribe to our Telegram channel for the latest updates.

NDPC Signs MoUs with BPP, NGF to Strengthen Data Protection and Privacy

Most Reliable Chatbots for Everyday Jobs

Joan Aimuengheuwa

Related Posts

Africa’s Enterprise Infrastructure Moving Toward Hybrid Multicloud Model – Nutanix Executive

Mira Murati Says Sam Altman ‘Created Chaos’ at OpenAI During Leadership Crisis

AI Power Surge Forces Microsoft to Reconsider 2030 Clean Energy Goal

Most Reliable Chatbots for Everyday Jobs

Leave a Reply Cancel reply