

Google LLC today introduced what it says is an experimental new system for speech translation that removes many of the steps involved in its earlier models.
Even better, the synthesized translations it produces retain the sound of the original speaker’s voice, so it actually sounds like the person is speaking in the target language.
Google said its Translatotron tool simplifies a complex process for translating speech into different languages. Existing translation systems such as Google Translate have to do it in a kind of roundabout way, first transcribing the original speech into text, then translating it into text in the target language, and finally using this new text to synthesize speech in the translated language.
Obviously, all of these steps can slow things down, but Translatotron speeds things up because it uses a single model that eliminates the need to translate speech to text first.
“This system avoids dividing the task into separate stages,” Google AI engineers Ye Jia and Ron Weiss wrote in a blog post. The result should be faster translation speeds and less compounding errors, they said.
“To the best of our knowledge, Translatotron is the first end-to-end model that can directly translate speech from one language into speech in another language,” Jia and Weiss added. “It is also able to retain the source speaker’s voice in the translated speech.”
The Translatotron system works by using “spectrograms,” which are visual representations of the spectrum of frequencies of audio signals as they vary over time, as its input training data. An encoder network is used to capture the speaker’s voice, while “multitask learning” is used to predict the words they are saying, and translate them into the target language.
Google admits the system is still experimental, and that the BLEU score that’s used to measure machine translation quality found that its accuracy is still currently lower than conventional translation tools. However, Google said it’s working to improve the system.
Analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE that Translatotron was an interesting concept, noting that transcription is becoming table stakes for cloud providers.
“The combination of understanding speech and then translating it to a desired language is raising the game and that’s what Google is doing with the Translatotron,” Mueller said. “We are getting close to the point where kids will be asking why they should even bother with learning a foreign language.”
Indeed, within a few years it really might not be necessary to speak more than one language. One possible application for Translatotron could be the new “Interpreter Mode” found in Google Assistant, which was added to Google Home speakers earlier this year. Interpreter Mode currently relies on Google’s conventional translation tools and can translate speech between 27 language pairs.
For a more in-depth look at how Translatotron works, Google has a whitepaper on the subject.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.