AI
AI
AI
Google LLC’s newest artificial intelligence tool promises to bring real-time translation to every smartphone user, enabling more natural and fluid conversations between speakers of different languages.
That’s according to a new blog post today that announced the arrival of Gemini 3.5 Live Translate, which explained that it’s the company’s most advanced audio model for speech-to-speech translation released to date. Whereas traditional translation tools have always been cumbersome because of the way speech is processed and then translated in turns, Gemini 3.5 Live Translate is much speedier. According to Google, it can listen continuously as someone is talking, translate what they’re saying and then speak to the other person in their own language.
What this means is that non-multilinguals will be able to engage in almost-natural conversations, with only a couple of seconds of delay – similar, perhaps, to long-distance telephone calls back in the days of rotary telephones.
Google Product Manager Anuda Weerasinghe and Senior Staff Software Engineer Tony Lu said in the co-authored blog post that Gemini 3.5 Live Translate can detect which language a person is speaking automatically, so there’s no need to set anything up first. It supports more than 70 languages at launch, and that means it can support “thousands” of different language pairings.
The company is making it available to developers and enterprises, so the capability is likely to be integrated with third-party communication platforms in the near future. Of course, it’s also being rolled out to everyone directly in the Google Translate application.
This isn’t Google’s first attempt at real-time translation, but earlier efforts have always required specific hardware such as the company’s own smartphones and earbuds. Gemini 3.5 Live Translate is different in that it can work on any smartphone. It’s also based on a new architecture that changes how the translation process works.
It relies on “continuous stream translation,” which means that it doesn’t have to wait until one person has finished speaking before it starts generating a response. It results in much more fluid translated conversations, as the video below demonstrates:
Weerasinghe and Lu said Gemini 3.5 Live Translate is designed for the realities of the real world, meaning it can perform well in noisy environments and handle overlapping voices and informal speech. That means it’s suitable for more practical use cases, including customer support calls, classrooms, guided tours, ride-sharing services, live broadcasts and so on, they said.
They also emphasized the quality of the model’s voices. Rather than the robotic, synthetic voices found on the standard Google Translate app, it tries to preserve the speaker’s authenticity by matching their pacing, intonation and emotional tone. As such, the translated speech sounds a lot more natural, enhancing the flow of the conversation.
Google has long been at the forefront of machine translation, having launched the original Google Translate application more than 20 years ago, said Holger Mueller of Constellation Research. “The release of Gemini 3.5 Live Translate shows it has not yet relinquished that lead, both in terms of translation quality and supported languages,” he said. “Now it’s pushing the envelope further again with simultaneous translation in a consumer app for the first time, and it may even be better quality than some human translators. Certainly, it is going to be a whole lot cheaper.”
Google’s long-term goal with Gemini 3.5 Live Translate is to change the world by enabling people to converse naturally with anyone in the world, regardless of the languages they speak. By the sounds of it, it has a lot of potential to make life easier for travelers and anyone trying to do business with foreign entities.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.