UPDATED 22:32 EDT / MAY 15 2019

AI

Google announces Translatotron tool for translating speech in the speaker’s original voice

Google LLC today introduced what it says is an experimental new system for speech translation that removes many of the steps involved in its earlier models.

Even better, the synthesized translations it produces retain the sound of the original speaker’s voice, so it actually sounds like the person is speaking in the target language.

Google said its Translatotron tool simplifies a complex process for translating speech into different languages. Existing translation systems such as Google Translate have to do it in a kind of roundabout way, first transcribing the original speech into text, then translating it into text in the target language, and finally using this new text to synthesize speech in the translated language.

Obviously, all of these steps can slow things down, but Translatotron speeds things up because it uses a single model that eliminates the need to translate speech to text first.

“This system avoids dividing the task into separate stages,” Google AI engineers Ye Jia and Ron Weiss wrote in a blog post. The result should be faster translation speeds and less compounding errors, they said.

“To the best of our knowledge, Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language,” Jia and Weiss added. “It is also able to retain the source speaker’s voice in the translated speech.”

The Translatotron system works by using “spectrograms,” which are visual representations of the spectrum of frequencies of audio signals as they vary over time, as its input training data. An encoder network is used to capture the speaker’s voice, while “multitask learning” is used to predict the words they are saying, and translate them into the target language.

Google admits the system is still experimental, and that the BLEU score that’s used to measure machine translation quality found that its accuracy is still currently lower than conventional translation tools. However, Google said it’s working to improve the system.

Analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE that Translatotron was an interesting concept, noting that transcription is becoming table stakes for cloud providers.

“The combination of understanding speech and then translating it to a desired language is raising the game and that’s what Google is doing with the Translatotron,” Mueller said. “We are getting close to the point where kids will be asking why they should even bother with learning a foreign language.”

Indeed, within a few years it really might not be necessary to speak more than one language. One possible application for Translatotron could be the new “Interpreter Mode” found in Google Assistant, which was added to Google Home speakers earlier this year. Interpreter Mode currently relies on Google’s conventional translation tools and can translate speech between 27 language pairs.

For a more in-depth look at how Translatotron works, Google has a whitepaper on the subject.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU