UPDATED 09:30 EST / AUGUST 22 2023

AI

Meta AI’s SeamlessM4T model enables universal, on-demand translation for hundreds of languages

A universal translator akin to the Babel Fish from “The Hitchhiker’s Guide To The Galaxy” might soon be possible.

It’s all thanks to Meta Platforms Inc.’s Fundamental Artificial Intelligence Research team, which has open-sourced a new AI model that’s designed to provide instant translations in any language and format. Announced today, SeamlessM4T is a foundational multilingual and multitask model that’s able to translate and transcribe hundreds of languages on-demand.

Such a capability has long been dreamed of in the realm of science fiction, but Meta says it’s on the verge of becoming possible. That’s thanks to the capabilities of SeamlessM4T, which is being open-sourced and made available to the AI research community.

According to Meta, SeamlessM4T supports speech recognition and speech-to-text translation for 100 input and output languages, and speech-to-speech translation for almost 100 input and 35 output languages. It can also perform text-to-text translation and text-to-speech translation for 100 input and 35 output languages.

In a blog post, Meta explains that building a universal translator has always been challenging because existing systems only cover a small fraction of the world’s languages. What’s more, there has always been a need to rely on different AI models for the variety of translation tasks, such as speech-to-text, speech-to-speech and text-to-text. Such systems need to be trained using vast amounts of data and only perform well for a single modality type.

As a unified multilingual model for all modalities, SeamlessM4T changes that, providing on-demand translations that allow people who speak different languages to communicate far more easily than before. What’s more, it’s claimed to provide a significant improvement in translation performance for low- and mid-resource languages, where data is lacking.

At the same time, it matches the performance of existing models for high-resource languages such as English, German and Spanish. It also recognizes languages by itself, without the need for a separate language identification model, Meta said.

Meta explained how it built SeamlessM4T using a redesigned Fairseq sequence modeling toolkit, combined with the multitask UnitY model architecture:

“The multitask UnitY model consists of three main sequential components. Text and speech encoders have the task of recognizing speech input in nearly 100 languages. The text decoder then transfers that meaning into nearly 100 languages for text followed by a text-to-unit model to decode into discrete acoustic units for 36 speech languages. Each of these components in the multitask UnitY are pre-trained by a component model for a sub-task of text-to-text, speech-to-text and speech-to-speech. The decoded discrete units are then converted into speech using a multilingual HiFi-GAN unit vocoder.”

In terms of its performance, Meta’s researchers claim SeamlessM4T outperforms other models, with “state-of-the-art results” for nearly 100 languages plus multitask support across automatic speech recognition, speech-to-text, speech-to-speech, text-to-speech and text-to-text translation.

To evaluate SeamlessM4T in the real world, Meta’s researchers used the BLASER 2.0 translation evaluation metric, which revealed that it performs better than existing models in noisy environments and with speaker variations.

Meta said it’s publicly releasing SeamlessM4T under a Creative Commons BY-NC 4.0 license, meaning that other AI researchers and developers are free to take this work and build atop of it. “We believe SeamlessM4T is an important breakthrough in the AI community’s quest toward creating universal multitask systems,” the researchers wrote. “Keeping with our approach to open science, we are excited to share our model publicly to allow researchers and developers to build on this technology.”

As well as the model itself, Meta is also releasing the metadata of SeamlessAlign, which is the multimodal translation dataset it used to train SeamlessM4T, totalling more than 265,000 hours of mined speech and text alignments. Meanwhile, SONAR is a suite of speech and text sentence encoders to perform data mining on monolingual datasets, while Stopes is a library for multimodal data processing and parallel data mining. They’re also being made available.

Meta said its research team has been working to create the foundation of a universal translator for several years already. Last year, it released a text-to-text machine translation model called No Language Left Behind, which supports more than 200 languages and is now used by Wikipedia for article translations.

Later that year, it announced its first Universal Speech Translator, providing direct speech-to-speech translation for an array of languages, including Hokkien, which lacks a widely used writing system. Through those efforts, it also built SpeechMatrix, a large-scale multilingual speech-to-speech translation dataset to power supervised representation learning for AI models.

Meta followed up earlier this year with its Massively Multilingual Speech model, which performs tasks including speech recognition, language identification and speech synthesis for more than 1,100 languages. Its researchers said SeamlessM4T builds on all of these initiatives to create a high-performance multilingual and multimodal translation system based on a single AI model.

Meta said its next task is to explore how SeamlessM4T can serve as the foundation for new communication capabilities and bring us much closer to a world in which everyone can be understood.

Image: Rawpixel.com/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU