Meta has announced the launch of the universal speech translator (UST) project, which aims to create AI systems that enable real-time speech-to-speech translation across all languages, even those that are spoken but not commonly written.
“Meta AI built the first speech translator that works for languages that are primarily spoken rather than written.
We’re open-sourcing this so people can use it for more languages,” said Mark Zuckerberg, cofounder and CEO of Meta.
The current speech translation systems that are used today rely on speech to text systems. However, oral languages do not have transcribed texts. Therefore, the company created a new speech-to-speech translation.
“We used speech-to-unit translation (S2UT) to translate input speech to a sequence of acoustic units directly in the path previously pioneered by Meta,” the company said in its blog post, referencing its past research on speech-to-speech translation initiatives.
“Then, we generated waveforms from the units. In addition, UnitY was adopted for a two-pass decoding mechanism, where the first-pass decoder generates text in a related language (Mandarin) and the second-pass decoder creates units,” it stated.