speech recognition Archives - Tech | Business | Economy

Nigeria’s Intron Launches Sahara v2 Voice AI Supporting 24 African Languages, 500 Accents

Joan Aimuengheuwa — Thu, 05 Mar 2026 16:44:36 +0000

Nigerian technology company, Intron, has launched a new voice recognition model designed to better understand African languages and accents, after years of complaints that global voice AI assistants usually misinterpret local speech.

The model, called Sahara v2, supports 24 African languages and recognises more than 500 African English accents. The company said the system was trained using more than 14 million audio clips collected from over 40,000 speakers across Africa and the diaspora.

For many users on the continent, voice technology usually has challenges with everyday phrases and names. Common expressions can be misheard or completely distorted, making digital assistants unreliable for basic tasks.

Developers say the problem lies in how most global systems were built. Many were trained mainly on Western speech patterns and do not align with the tonal nature, accent variety and frequent language mixing common across African countries.

With Sahara v2, Intron says it wants to close that gap by building technology that listens to how people actually speak. The recordings used to train the system were gathered across environments, including clinics, courtrooms, call centres, streets and offices.

The new model covers languages such as Hausa, Swahili, Yoruba, Igbo, Zulu, Twi, Kinyarwanda and Xhosa. In total, Intron says its systems now support 57 languages.

One of the additions is a bilingual speech recognition system that switches between English and Swahili. Intron developed the model with Kenya-based health provider Penda Health to better match how people naturally move between both languages in conversation.

The company also released a Hausa text-to-speech system designed to power local language voice assistants that can run continuously for services such as customer support.

Intron said the new system can also operate offline, allowing organisations to run voice tools locally where privacy or data security is a concern.

According to the company, Sahara v2 performs better on African speech compared with several widely used global models. These include systems developed by Google, OpenAI, Amazon Web Services and Microsoft.

Testing carried out by the company showed stronger accuracy when recognising African names, locations, numbers and sector-specific terms used in areas such as finance, healthcare and telecommunications.

Several organisations have already begun using the system in their services. These include voice banking platforms, medical documentation tools, courtroom transcription systems and automated call centre software.

Ayo Oluleye, head of Data and Insights at ARM Investments, said the model improved the accuracy of automated transcription.

“Using Intron AI models, we’ve seen significant improvement in transcription and summaries compared to models we previously explored. Their systems capture context and nuance better, leading to more accurate results.”

Sarah Morris, chief product officer at Audere, said the system also performed well during testing. “In our testing, accuracy was excellent on several Southern African accents and APIs were robust with 99%+ success rates.”

Alongside the launch, Intron also released its first Africa Voice AI report for 2026, examining how voice technology is being developed and used across the continent.

The report aims to guide governments, businesses, investors and researchers working to expand digital services that rely on speech technology.

Tobi Olatunji, chief executive of Intron, noted that the project shows what happens when technology is designed with local languages in mind.

“Sahara v2 proves that when technology is built with deep cultural and linguistic understanding, amazing things can happen, and we’re just getting started.”

The post Nigeria’s Intron Launches Sahara v2 Voice AI Supporting 24 African Languages, 500 Accents appeared first on Tech | Business | Economy.

Amazon Launches Nova Sonic, a Unified Voice Model for Natural Conversations

Joan Aimuengheuwa — Tue, 08 Apr 2025 15:26:17 +0000

Amazon has released Nova Sonic, a new voice model designed to improve the quality of machine-human conversations by unifying speech recognition and generation into one system.

This could change how machines talk, and more importantly, how they listen. Forget about the traditional stack of voice recognition patched together with text generation and speech synthesis. That model is clunky, robotic, and frankly, outdated.

Today, Amazon introduced Nova Sonic, a unified system that listens, understands, and responds in real-time — and it does so in a way that feels less like talking to a machine and more like having a conversation.

To be clear, this isn’t another Alexa upgrade. Nova Sonic is in another league. It’s built to capture the nuances that matter — your tone, your pace, your hesitations. When you pause, it pauses. When you sound anxious, it softens its tone. It picks up on the cues humans take for granted, but machines have long missed.

Developers can now access Nova Sonic through Amazon Bedrock, using a streaming API that opens the door for a new kind of voice-enabled experience — not just in customer service, but across sectors like travel, healthcare, education, and even entertainment. This is no gimmick. It’s not just about answering questions faster. It’s about answering them better.

If you’ve ever been frustrated speaking to a virtual assistant that cuts you off, mishears you, or takes a second too long to reply, Nova Sonic aims to fix that. It doesn’t interrupt. It waits its turn. It handles overlapping dialogue with the kind of grace that’s been missing from digital assistants.

According to Amazon, it clocks in with an average latency of 1.09 seconds. That’s faster than OpenAI’s highly touted Realtime API, which hits 1.18 seconds.

Let’s talk performance. On Multilingual LibriSpeech, a widely used benchmark, Nova Sonic recorded a word error rate of just 4.2% across English, French, German, Spanish, and Italian. That means it’s catching more of what you say — even if you mumble, speak with an accent, or talk in a noisy room.

In multi-speaker, loud environments, it outperformed OpenAI’s GPT-4o-transcribe by 46.7%. Those aren’t small wins. Those are statement numbers.

Again, the strength of Nova Sonic lies in what Amazon calls unification. It doesn’t rely on three separate models stitched together. Instead, one model handles the full loop — from recognising speech to generating a human-like reply.

That unity preserves the acoustic context: the style, rhythm, emotion, and intent in your voice. The result? Conversations that feel less scripted and more spontaneous.

In real-world use, Nova Sonic is already making waves. A virtual travel assistant built on the model shifts tone mid-conversation when a customer’s excitement turns into cost-related anxiety. The assistant doesn’t just respond with prices — it reassures. It mirrors the emotional flow of the dialogue.

That adaptability isn’t just good UX. It’s smart business. It gives companies the chance to build agents that feel more attentive, more human, and less like automated menu trees. Enterprise users can even build assistants that pull live reports, reference internal data, and follow up with insightful questions — without making the speaker repeat themselves or rephrase for clarity.

“Nova Sonic is the most cost-efficient AI voice model on the market,” Amazon says. It’s reportedly 80% cheaper to run than OpenAI’s GPT-4o, and some of its components are already powering Alexa+, the company’s upgraded assistant.

Rohit Prasad, Amazon’s SVP and Head Scientist for AGI, didn’t mince words: “Nova Sonic builds on Amazon’s expertise in large orchestration systems.” He added that the model doesn’t just respond — it knows when to reach out to APIs, fetch real-time information, and act. “It routes user requests to different APIs,” Prasad said, describing how the model determines when and how to access tools or databases to complete a task.

This model isn’t a one-off. It’s the start of something bigger. Amazon has made clear that it’s doubling down on its pursuit of intelligent systems that don’t just process language, but understand it in context — whether it’s voice, video, or even sensory data. Nova Sonic marks the first major step in opening up that technology to developers, not just keeping it in-house.

And Amazon says it’s time for digital voices to grow up. And Nova Sonic? It just gave them a voice worth listening to.

The post Amazon Launches Nova Sonic, a Unified Voice Model for Natural Conversations appeared first on Tech | Business | Economy.