Speech-to-Text Archives - Tech | Business | Economy

OpenAI Launches New Real-Time Voice Models for Translation, Live Conversations

Joan Aimuengheuwa — Fri, 08 May 2026 08:30:41 +0000

OpenAI has launched three new voice models for developers, expanding its real-time audio tools that can speak, translate and transcribe conversations as they happen.

The company said the new tools are designed to make voice-based apps more useful in everyday situations, especially where users need software to respond naturally while carrying out tasks in real time.

At the centre of the launch is GPT-Realtime-2, a voice model OpenAI says can handle more difficult requests while keeping conversations flowing naturally.

Unlike earlier versions, the company said the model uses GPT-5-level reasoning to manage interruptions, understand context better and carry out actions during conversations.

OpenAI also unveiled GPT-Realtime-Translate, a live translation model that can translate speech from more than 70 input languages into 13 output languages.

According to the company, the system keeps pace with the speaker during conversations instead of translating after pauses or completed sentences.

The third model, GPT-Realtime-Whisper, focuses on live speech transcription. It converts spoken words into text instantly while a person is talking.

“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said.

Voice products have become a huge focus for technology companies as more users interact with software through speech instead of typing. OpenAI said developers want systems that can manage tasks while conversations continue naturally.

The company pointed to customer support, travel, education, media and creator platforms as some of the areas expected to benefit from the new models.

OpenAI also described three growing patterns it sees in voice-based software.

The first is “voice-to-action”, where users speak naturally and the system completes tasks on their behalf. OpenAI said property platform Zillow is building an assistant that can help users search for homes, avoid certain neighbourhood conditions and book tours through voice requests.

Another pattern is “systems-to-voice”, where software provides spoken updates automatically. OpenAI gave the example of travel apps that could alert passengers about delayed flights, new boarding gates or transfer routes without users typing commands.

The third area is “voice-to-voice”, which focuses on live multilingual conversations. OpenAI said Deutsche Telekom is developing customer support systems that translate discussions instantly while both sides continue speaking in their preferred languages.

Travel company Priceline is also working on voice-based trip management tools, according to OpenAI. Travellers could eventually book flights, change hotel reservations and receive airport updates entirely through conversation.

Alongside the broader rollout, OpenAI added several new features to GPT-Realtime-2 aimed at improving live interactions.

Developers can now enable short phrases such as “let me check that” or “one moment while I look into it” before the system completes a request. OpenAI said this gives users clearer feedback while the model processes tasks in the background.

The model can also call multiple tools at once and explain those actions aloud during conversations. OpenAI said the system may say things like “checking your calendar” or “looking that up now” while working through requests.

The company added that GPT-Realtime-2 recovers better from errors or failed requests instead of stopping conversations abruptly. It also supports a larger context window, increasing from 32K to 128K, allowing longer and more detailed conversations.

OpenAI further noted that the model has improved understanding of specialised terms, including healthcare vocabulary and proper nouns. Developers can also adjust how much reasoning power the model uses depending on the complexity of a request.

According to benchmark figures released by the company, GPT-Realtime-2 achieved higher scores than GPT-Realtime-1.5 on audio intelligence and instruction-following tests.

All three models are available through OpenAI’s Realtime API. The company said GPT-Realtime-Translate and GPT-Realtime-Whisper will be billed by the minute, while GPT-Realtime-2 pricing depends on token usage.

OpenAI said it has added safeguards to reduce misuse, including protections against spam, fraud and harmful content. The company also added that conversations can be stopped automatically if they break its safety rules.

The post OpenAI Launches New Real-Time Voice Models for Translation, Live Conversations appeared first on Tech | Business | Economy.

Deepgram Selects Penguin to Optimize AI Inference Infrastructure for Enterprise Voice AI

Staff Writer — Fri, 20 Mar 2026 10:30:55 +0000

Penguin Solutions, the AI factory platform company, recently announced a strategic collaboration with Deepgram and Dell Technologies to architect and deploy a fully optimized, production-ready infrastructure aligned to Deepgram’s demanding enterprise voice AI requirements.

By leveraging its unique expertise in designing, building, deploying, and managing AI infrastructure with Dell PowerEdge servers and Dell PowerScale storage optimized for AI workloads, Penguin Solutions delivered an optimal solution to support and enhance Deepgram’s innovative Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Agent capabilities, while ensuring maximum reliability and performance.

As enterprise adoption of generative AI accelerates, organizations must adhere to stricter service level agreements (SLAs), which require infrastructure that can ensure low latency and high concurrent usage.

This Penguin-led deployment addresses these challenges by combining Deepgram’s innovative voice AI models with a purpose-built architectural design, a highly efficient deployment, and ongoing performance optimization.

“Modern AI workloads demand infrastructure that performs consistently and scales predictably under heavy loads, particularly for real-time inference applications like voice agents,” said Joe Castillo, vice president of sales at Penguin Solutions. “By partnering with Deepgram and utilizing proven Dell AI infrastructure, Penguin Solutions is delivering a validated, scalable, end-to-end architecture. Our comprehensive framework equips Deepgram with the optimized infrastructure needed to reliably and accurately deliver complex voice AI capabilities in healthcare, retail, and other industries.”

Drawing on its extensive experience with HPC and AI infrastructure, Penguin Solutions ensures that the underlying infrastructure meets the specific demands of Deepgram’s neural networks.

The architecture also incorporates Dell PowerScale storage and Dell PowerEdge XE7745 servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, which provide efficient inferencing that enables data-intensive voice applications to operate seamlessly in real-time environments.

“Deepgram is focused on delivering voice AI capabilities that meet the demanding performance, scalability, and reliability requirements of enterprise environments – something only Deepgram brings to the market today,” said Abe Pursell, vice president of partnerships and business development at Deepgram. “The infrastructure behind our platform has to be equally robust to support that level of innovation. Penguin Solutions demonstrated a deep understanding of our technical requirements, translating them into a sophisticated infrastructure environment that meets and exceeds expectations.

This enables us to continue delivering the enterprise-class capabilities our customers rely on.”

“AI-driven voice applications are transforming how organizations engage with customers and patients, but success depends on a resilient, high-performance infrastructure foundation,” said David Noy, vice president, unstructured data solutions product management at Dell Technologies. “Our collaboration with Penguin Solutions demonstrates how AI-optimized Dell PowerScale storage and Dell PowerEdge servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs can accelerate enterprise AI adoption at scale. Together, we’re enabling Deepgram to deliver secure, low-latency voice AI experiences that power mission-critical innovation across healthcare and retail.”

The Deepgram-Penguin Solutions-Dell collaboration comprises a comprehensive approach for enterprises looking to modernize their customer and employee experiences.

With Deepgram’s API-driven voice capabilities, Penguin Solutions’ AI services, and Dell’s powerful AI infrastructure, organizations can achieve highly accurate, real-time transcription and speech synthesis, all while maintaining strict data governance and control.

The post Deepgram Selects Penguin to Optimize AI Inference Infrastructure for Enterprise Voice AI appeared first on Tech | Business | Economy.

IBM, Deepgram Launch Real-Time AI Voice Solutions for Enterprises

Joan Aimuengheuwa — Tue, 24 Feb 2026 17:14:32 +0000

IBM and Deepgram have partnered to integrate Deepgram’s speech-to-text and text-to-speech technology into IBM’s watsonx Orchestrate platform.

This makes Deepgram IBM’s first voice partner, providing real-time transcription and voice features for enterprise clients.

The integration is designed to improve how companies handle complex audio environments, including background noise, accents, and natural conversation.

It also supports a wide range of languages and regional dialects, including multiple Arabic and Indian variants. Users will gain access to real-time captioning, natural-sounding voices, and options for custom tuning.

These tools can be applied across sectors such as healthcare and finance, supporting automated customer care, call analysis, and voice-driven data entry.

Scott Stephenson, Deepgram CEO and co-founder, said, “Voice is rapidly becoming the default interface between humans and technology, and enterprise deployments require a real-time platform that is accurate, low latency, and reliable at scale.

“By embedding Deepgram inside watsonx Orchestrate Agent Builder, IBM clients can build voice agents and voice-enabled workflows on top of a real-time foundation that has been developed and refined over more than a decade.”

Nick Holda, vice president of AI Technology Partnerships at IBM, added, “Our watsonx Orchestrate integration powered by Deepgram APIs introduces new speech recognition and transcription capabilities to IBM clients, refining and modernizing their operations.

“This collaboration aims to help enterprise organizations accelerate their AI initiatives and reinforces IBM’s open ecosystem, bringing choice and cutting-edge voice technology to partners and customers.”

The partnership is expected to strengthen IBM’s ability to provide flexible voice solutions to enterprise clients while expanding Deepgram’s reach to new customers through a trusted platform.

Deepgram provides real-time speech-to-text, text-to-speech, and full speech-to-speech features through cloud or on-premises APIs.

It has processed over 50,000 years of audio and transcribed more than one trillion words. IBM, on the other hand, provides hybrid cloud, AI, and consulting solutions to clients in over 175 countries.

The post IBM, Deepgram Launch Real-Time AI Voice Solutions for Enterprises appeared first on Tech | Business | Economy.

Deepgram Earns AWS Generative AI Competency to Strengthen Voice AI Solutions

Joan Aimuengheuwa — Fri, 03 Oct 2025 06:51:22 +0000

Deepgram has earned the Amazon Web Services (AWS) Generative AI Competency, a recognition that strengthens the company’s place among trusted partners helping organisations deploy advanced artificial intelligence solutions at scale.

The designation comes after a demanding evaluation that required Deepgram to demonstrate technical strength, verified customer success, and real-world deployments. This acknowledgement reveals that its voice technologies are powerful, secure and production-ready.

Abe Pursell, vice president of Business Development and Partnerships at Deepgram, explained the importance of the achievement. “Generative AI is one of the most transformative technologies of our time — but in order for enterprises to adopt it with confidence, they need proof it works at scale and integrates seamlessly into their existing stack. This recognition from AWS gives our customers exactly that peace of mind. It shows Deepgram’s voice AI solutions have already been tested, vetted, and proven in the real world.”

For customers, the benefit isn’t limited to trust. The partnership brings closer alignment with AWS services such as Amazon Bedrock, Amazon Connect, and Amazon SageMaker. It also enables enterprises to take advantage of AWS Marketplace access, Private Pricing Agreements (PPAs), and AWS credits, factors that can reduce costs and boost deployment.

According to Pursell, “For customers, collaborating with an AWS Generative AI Competency provider like Deepgram translates into faster time-to-value, reduced total cost of ownership (TCO), and peace of mind that their investment is future-proofed within the AWS GenAI ecosystem.”

The AWS Competency Programme is designed to help organisations identify partners with proven expertise in using AWS tools and infrastructure to build and integrate generative AI solutions.

For Deepgram, it represents an endorsement of years spent refining voice-native models capable of handling speech-to-text, text-to-speech, and speech-to-speech tasks with speed and accuracy.

With more than 200,000 developers building on its platform, and over a trillion words already transcribed, the company has built solutions highly essential in the voice AI market. Including startups and global enterprises, Deepgram’s services now stand on even stronger ground within AWS’s growing generative AI space.

The post Deepgram Earns AWS Generative AI Competency to Strengthen Voice AI Solutions appeared first on Tech | Business | Economy.