UPDATED 09:30 EDT / AUGUST 22 2023

Meta AI’s SeamlessM4T model enables universal, on-demand translation for hundreds of languages

A universal translator akin to the Babel Fish from “The Hitchhiker’s Guide To The Galaxy” might soon be possible.

It’s all thanks to Meta Platforms Inc.’s Fundamental Artificial Intelligence Research team, which has open-sourced a new AI model that’s designed to provide instant translations in any language and format. Announced today, SeamlessM4T is a foundational multilingual and multitask model that’s able to translate and transcribe hundreds of languages on-demand.

Such a capability has long been dreamed of in the realm of science fiction, but Meta says it’s on the verge of becoming possible. That’s thanks to the capabilities of SeamlessM4T, which is being open-sourced and made available to the AI research community.

According to Meta, SeamlessM4T supports speech recognition and speech-to-text translation for 100 input and output languages, and speech-to-speech translation for almost 100 input and 35 output languages. It can also perform text-to-text translation and text-to-speech translation for 100 input and 35 output languages.

In a blog post, Meta explains that building a universal translator has always been challenging because existing systems only cover a small fraction of the world’s languages. What’s more, there has always been a need to rely on different AI models for the variety of translation tasks, such as speech-to-text, speech-to-speech and text-to-text. Such systems need to be trained using vast amounts of data and only perform well for a single modality type.

As a unified multilingual model for all modalities, SeamlessM4T changes that, providing on-demand translations that allow people who speak different languages to communicate far more easily than before. What’s more, it’s claimed to provide a significant improvement in translation performance for low- and mid-resource languages, where data is lacking.

At the same time, it matches the performance of existing models for high-resource languages such as English, German and Spanish. It also recognizes languages by itself, without the need for a separate language identification model, Meta said.

Meta explained how it built SeamlessM4T using a redesigned Fairseq sequence modeling toolkit, combined with the multitask UnitY model architecture:

“The multitask UnitY model consists of three main sequential components. Text and speech encoders have the task of recognizing speech input in nearly 100 languages. The text decoder then transfers that meaning into nearly 100 languages for text followed by a text-to-unit model to decode into discrete acoustic units for 36 speech languages. Each of these components in the multitask UnitY are pre-trained by a component model for a sub-task of text-to-text, speech-to-text and speech-to-speech. The decoded discrete units are then converted into speech using a multilingual HiFi-GAN unit vocoder.”

In terms of its performance, Meta’s researchers claim SeamlessM4T outperforms other models, with “state-of-the-art results” for nearly 100 languages plus multitask support across automatic speech recognition, speech-to-text, speech-to-speech, text-to-speech and text-to-text translation.

To evaluate SeamlessM4T in the real world, Meta’s researchers used the BLASER 2.0 translation evaluation metric, which revealed that it performs better than existing models in noisy environments and with speaker variations.

Meta said it’s publicly releasing SeamlessM4T under a Creative Commons BY-NC 4.0 license, meaning that other AI researchers and developers are free to take this work and build atop of it. “We believe SeamlessM4T is an important breakthrough in the AI community’s quest toward creating universal multitask systems,” the researchers wrote. “Keeping with our approach to open science, we are excited to share our model publicly to allow researchers and developers to build on this technology.”

As well as the model itself, Meta is also releasing the metadata of SeamlessAlign, which is the multimodal translation dataset it used to train SeamlessM4T, totalling more than 265,000 hours of mined speech and text alignments. Meanwhile, SONAR is a suite of speech and text sentence encoders to perform data mining on monolingual datasets, while Stopes is a library for multimodal data processing and parallel data mining. They’re also being made available.

Meta said its research team has been working to create the foundation of a universal translator for several years already. Last year, it released a text-to-text machine translation model called No Language Left Behind, which supports more than 200 languages and is now used by Wikipedia for article translations.

Later that year, it announced its first Universal Speech Translator, providing direct speech-to-speech translation for an array of languages, including Hokkien, which lacks a widely used writing system. Through those efforts, it also built SpeechMatrix, a large-scale multilingual speech-to-speech translation dataset to power supervised representation learning for AI models.

Meta followed up earlier this year with its Massively Multilingual Speech model, which performs tasks including speech recognition, language identification and speech synthesis for more than 1,100 languages. Its researchers said SeamlessM4T builds on all of these initiatives to create a high-performance multilingual and multimodal translation system based on a single AI model.

Meta said its next task is to explore how SeamlessM4T can serve as the foundation for new communication capabilities and bring us much closer to a world in which everyone can be understood.

Image: Rawpixel.com/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Meta AI’s SeamlessM4T model enables universal, on-demand translation for hundreds of languages

Image: Rawpixel.com/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

UiPath Fusion 2025

theCUBE + NYSE Wired: AI Factories - Data Centers of the Future 2025

DigiCert World Quantum Readiness Day 2025

EVOLVE25

Oktane 2025

Meta AI’s SeamlessM4T model enables universal, on-demand translation for hundreds of languages

Image: Rawpixel.com/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

UiPath Fusion 2025

theCUBE + NYSE Wired: AI Factories - Data Centers of the Future 2025

DigiCert World Quantum Readiness Day 2025

EVOLVE25

Oktane 2025

Cookies