Google makes its Speech-to-Text and Text-to-Speech services more accurate and accessible
Google LLC is pushing its popular Cloud Speech-to-Text and Text-to-Speech machine learning services harder, adding new features to both and making the former more accessible to large enterprises.
Google Cloud Speech-to-Text is essentially an advanced transcription service that relies on machine learning and other artificial intelligence technologies to improve its accuracy. This reliance on AI is important, because anything less than 100 percent accuracy can result in serious mistakes that make it difficult to have a useful conversations, Google Product Manager Dan Aharon said in a blog post today.
But Google freely admits that its Speech-to-Text isn’t always entirely accurate because many applications that use it run on “noisy” phone lines that can make it difficult to interpret exactly what people are saying.
“When creating intelligent voice applications, speech recognition accuracy is critical,” Aharon said. “As you can see with the illustration below, even at 90 percent accuracy, it’s hard to have a useful conversation.”
To account for this, Google introduced a premium version of Speech-to-Text in beta last year for customers that opt to share usage data so it can help to refine its algorithms. They include an enhanced phone model that produces 62 percent fewer transcription errors than the regular model and a video model that’s useful for conference calls with multiple speakers.
Today, Google is making its premium models generally available to all customers, including those who don’t want to opt into its data logging program. However, not opting in to the program comes at a price, because those who do opt in will pay 33 percent less for the service.
“We’ve also cut pricing for the premium video model by 25 percent, for a total savings of 50 percent for current video model customers who opt-in to data logging,” Aharon said.
Google is also adding a new feature to Speech-to-Text called “multi-channel recognition,” which can better distinguish among different people in a conversation.
Speech-to-Text’s premium models have already been adopted by numerous enterprises, including LogMeIn Inc., which uses the service to create transcripts of meetings on its popular GoToMeeting app.
More voice and languages for Text-to-Speech
Google is also updating Text-to-Speech, which does precisely the opposite of its sister service, transforming written text into artificial speech in realistic human voices. The service is getting more artificial voices powered by Google’s WaveNet technology, and is being made available in more languages. “Thanks to unique access to WaveNet technology powered by Google Cloud TPUs [Tensor Processing Unit AI chips], we can build new voices and languages faster and easier than is typical in the industry,” Aharon said.
New languages being introduced in beta today include Danish, Norwegian, Portuguese, Russian, Polish, Slovakian and Ukrainian. This means Text-to-Speech now supports 21 languages in total. Google is also adding 31 new artificial WaveNet voices to the service, plus 24 “standard” voices.
Lastly, Google is adding a new Device Profiles feature to Text-to-Speech that’s able to optimize audio playback on different kinds of hardware. “For example, some customers with call center applications optimize for interactive voice response, whereas others that focus on content and media optimize for headphones,” Aharon said. “In every case, the audio effects are customized for the hardware.”
Analyst Holger Mueller of Constellation Research Inc. said the updates are compelling because speech is rapidly emerging as the new user interface, so improvements to accuracy and support for more languages should be welcomed by enterprises.
“Google keeps providing and improving, and now adds support for better consumability,” Mueller said. “CxOs who’re building voice-related applications simply have to include Google on their shortlist of enabling providers.”
Photo: Robert Scoble/Flickr
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU