UPDATED 22:41 EST / JULY 30 2019

AI

Amazon’s text-to-speech service Polly gets a newscaster-style voice

Amazon Web Services Inc. is taking on Google LLC in human voice replication, adding two new features today to Amazon Polly, a cloud-based service that transforms text into lifelike speech and is used to create applications that can talk.

The first of the new features is called Neural Text-To-Speech, which Amazon says delivers “significant improvements” in speech quality by boosting the “naturalness” and “expressiveness” of synthesized voices.

One of the great things about Neural Text-To-Speech is that it’s able to learn new speaking styles with just a few hours of training, thanks to a new artificial intelligence model that Amazon wrote about in a research paper last year. That model works by combining large amounts of standard, neutral speech with just a few hours of additional voice data in the target speaking style. New supplementary data can be added as desired to create various additional speaking styles.

Here’s an example of a Neural Text-To-Speech voice in action:

Using Neural-Text-To-Speech’s underlying algorithm, Amazon created its second new feature, which is a newscaster-style voice that makes narration sound “even more realistic” when reading news articles and similar content, AWS evangelist Julien Simon wrote in a blog post.

“Speech quality is certainly important, but more can be done to make a synthetic voice sound even more realistic and engaging,” Simon said. “What about style? For sure, human ears can tell the difference between a newscast, a sportscast, a university class and so on; indeed, most humans adopt the right style of speech for the right context, and this certainly helps in getting their message across.”

Simon said organizations including The Globe and Mail, Encyclopedia Britannica and TIM Media are already using Polly’s newscaster style. The feature has also been introduced to Amazon Alexa-enabled devices, where it’s used to narrate daily news briefings and similar content.

Here’s a quick demo of the newscaster voice:

Amazon said the newscaster style is available in two English voices, while Neural-Text-To-Speech is available in 11, including three U.K. English accents and eight with U.S. accents. The voices all work in real time and in batch mode, and can be accessed from Amazon’s US East (N. Virginia), US West (Oregon), and Europe (Ireland) AWS regions.

Constellation Research Inc. analyst Holger Mueller said the Amazon Polly updates show that all of the major platform-as-a-service companies are getting serious about chatbots and conversational interfaces in general, since these platforms are quickly revolutionizing customer and employee experiences.

“With these new capabilities Amazon is focused on one of the three vital parts of conversational platforms, namely speech output,” Mueller said. “Its progress in making software-created speech is impressive, but we’ll have to see how quickly enterprises adopt the new capabilities.”

Amazon Polly rivals Google’s Text-to-Speech service, which is powered by its WaveNet framework and currently offers 57 voice styles in 21 languages. Microsoft Corp. also offers a similar service called the Azure Speech Service API, which offers 75 standard voices and three AI-generated voices.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.