UPDATED 13:00 EST / JUNE 27 2024

AI

Google uses AI to bring 110 new languages to Translate

Google LLC said today that it’s bringing 110 new languages to its web and smartphone translation app using the power of artificial intelligence, making it more comprehensive than ever with 243 languages in total.

This is the largest expansion to date for Google Translate since 2022, when the company brought 24 new languages to the app using zero-shot machine translation. That’s where a language model learns to translate a language without ever seeing an example.

The company employed PaLM 2, a transformer-based large language model AI developed by Google Research that first powered Bard, Google’s AI chatbot. It eventually evolved into Gemini, which is now powered by the company’s AI model of the same name. Differing from Gemini, PaLM 2 was trained on Pathways, a vast dataset of human language containing more than 1.56 trillion words and 250 billion parameters.

Given the size of this dataset, Google said, PaLM 2 can attain unprecedented fluency with written languages and demonstrated an impressive ability to perform linguistic tasks during testing including understanding idiomatic phrases. However, unlike Gemini, PaLM 2 cannot understand or generate images or work with audio.

Although choosing which 110 languages to add was not an easy task, Google said it aimed for the “most common varieties of each language.” The company also picked some languages that are on the brink of extinction.

“From Cantonese to Qʼeqchiʼ, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population,” said Isaac Caswell, a senior software engineer at Google Translate. “Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalization efforts.”

The addition of Cantonese has been one of the most requested languages for the app because the language often overlaps with Chinese Mandarin writing and it’s difficult to find data to train models. The Manx language was also added to this set, a Celtic language from the Isle of Man in the Irish Sea, which almost went extinct with the death of the last native speaker in 1974. The language has since seen an island-wide revival and now enjoys thousands of speakers.

Caswell said that PaLM 2 was particularly useful for learning to translate languages closely related to each other such as those similar to Hindi, like Awadhi and Marwadi. Similarly, the learning capability made it more efficient for French creoles such as Seychellois Creole and Mauritian Creole.

“About a quarter of the new languages come from Africa, representing our largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof,” added Caswell.

With more than 7,000 languages spoken throughout the world, Google has a long way to go to include most of them in Google Translate. Caswell said the company will continue to work with native speakers and expert linguists as part of its commitment to bring even more languages to the platform.

Google said that it will be rolling out the new languages over the next few days. Google Translate is accessible on the web and via the app on Android and iOS devices.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU