UPDATED 11:00 EST / FEBRUARY 21 2019

Google makes its Speech-to-Text and Text-to-Speech services more accurate and accessible

Google LLC is pushing its popular Cloud Speech-to-Text and Text-to-Speech machine learning services harder, adding new features to both and making the former more accessible to large enterprises.

Google Cloud Speech-to-Text is essentially an advanced transcription service that relies on machine learning and other artificial intelligence technologies to improve its accuracy. This reliance on AI is important, because anything less than 100 percent accuracy can result in serious mistakes that make it difficult to have a useful conversations, Google Product Manager Dan Aharon said in a blog post today.

But Google freely admits that its Speech-to-Text isn’t always entirely accurate because many applications that use it run on “noisy” phone lines that can make it difficult to interpret exactly what people are saying.

“When creating intelligent voice applications, speech recognition accuracy is critical,” Aharon said. “As you can see with the illustration below, even at 90 percent accuracy, it’s hard to have a useful conversation.”

goo_623bot_1280x720px_redux-revised-03

To account for this, Google introduced a premium version of Speech-to-Text in beta last year for customers that opt to share usage data so it can help to refine its algorithms. They include an enhanced phone model that produces 62 percent fewer transcription errors than the regular model and a video model that’s useful for conference calls with multiple speakers.

Today, Google is making its premium models generally available to all customers, including those who don’t want to opt into its data logging program. However, not opting in to the program comes at a price, because those who do opt in will pay 33 percent less for the service.

“We’ve also cut pricing for the premium video model by 25 percent, for a total savings of 50 percent for current video model customers who opt-in to data logging,” Aharon said.

Google is also adding a new feature to Speech-to-Text called “multi-channel recognition,” which can better distinguish among different people in a conversation.

Speech-to-Text’s premium models have already been adopted by numerous enterprises, including LogMeIn Inc., which uses the service to create transcripts of meetings on its popular GoToMeeting app.

More voice and languages for Text-to-Speech

Google is also updating Text-to-Speech, which does precisely the opposite of its sister service, transforming written text into artificial speech in realistic human voices. The service is getting more artificial voices powered by Google’s WaveNet technology, and is being made available in more languages. “Thanks to unique access to WaveNet technology powered by Google Cloud TPUs [Tensor Processing Unit AI chips], we can build new voices and languages faster and easier than is typical in the industry,” Aharon said.

New languages being introduced in beta today include Danish, Norwegian, Portuguese, Russian, Polish, Slovakian and Ukrainian. This means Text-to-Speech now supports 21 languages in total. Google is also adding 31 new artificial WaveNet voices to the service, plus 24 “standard” voices.

Lastly, Google is adding a new Device Profiles feature to Text-to-Speech that’s able to optimize audio playback on different kinds of hardware. “For example, some customers with call center applications optimize for interactive voice response, whereas others that focus on content and media optimize for headphones,” Aharon said. “In every case, the audio effects are customized for the hardware.”

Analyst Holger Mueller of Constellation Research Inc. said the updates are compelling because speech is rapidly emerging as the new user interface, so improvements to accuracy and support for more languages should be welcomed by enterprises.

“Google keeps providing and improving, and now adds support for better consumability,” Mueller said. “CxOs who’re building voice-related applications simply have to include Google on their shortlist of enabling providers.”

Photo: Robert Scoble/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google makes its Speech-to-Text and Text-to-Speech services more accurate and accessible

More voice and languages for Text-to-Speech

Photo: Robert Scoble/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Google makes its Speech-to-Text and Text-to-Speech services more accurate and accessible

More voice and languages for Text-to-Speech

Photo: Robert Scoble/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Cookies