UPDATED 22:41 EST / JULY 30 2019

AI

Amazon’s text-to-speech service Polly gets a newscaster-style voice

Amazon Web Services Inc. is taking on Google LLC in human voice replication, adding two new features today to Amazon Polly, a cloud-based service that transforms text into lifelike speech and is used to create applications that can talk.

The first of the new features is called Neural Text-To-Speech, which Amazon says delivers “significant improvements” in speech quality by boosting the “naturalness” and “expressiveness” of synthesized voices.

One of the great things about Neural Text-To-Speech is that it’s able to learn new speaking styles with just a few hours of training, thanks to a new artificial intelligence model that Amazon wrote about in a research paper last year. That model works by combining large amounts of standard, neutral speech with just a few hours of additional voice data in the target speaking style. New supplementary data can be added as desired to create various additional speaking styles.

Here’s an example of a Neural Text-To-Speech voice in action:

Using Neural-Text-To-Speech’s underlying algorithm, Amazon created its second new feature, which is a newscaster-style voice that makes narration sound “even more realistic” when reading news articles and similar content, AWS evangelist Julien Simon wrote in a blog post.

“Speech quality is certainly important, but more can be done to make a synthetic voice sound even more realistic and engaging,” Simon said. “What about style? For sure, human ears can tell the difference between a newscast, a sportscast, a university class and so on; indeed, most humans adopt the right style of speech for the right context, and this certainly helps in getting their message across.”

Simon said organizations including The Globe and Mail, Encyclopedia Britannica and TIM Media are already using Polly’s newscaster style. The feature has also been introduced to Amazon Alexa-enabled devices, where it’s used to narrate daily news briefings and similar content.

Here’s a quick demo of the newscaster voice:

Amazon said the newscaster style is available in two English voices, while Neural-Text-To-Speech is available in 11, including three U.K. English accents and eight with U.S. accents. The voices all work in real time and in batch mode, and can be accessed from Amazon’s US East (N. Virginia), US West (Oregon), and Europe (Ireland) AWS regions.

Constellation Research Inc. analyst Holger Mueller said the Amazon Polly updates show that all of the major platform-as-a-service companies are getting serious about chatbots and conversational interfaces in general, since these platforms are quickly revolutionizing customer and employee experiences.

“With these new capabilities Amazon is focused on one of the three vital parts of conversational platforms, namely speech output,” Mueller said. “Its progress in making software-created speech is impressive, but we’ll have to see how quickly enterprises adopt the new capabilities.”

Amazon Polly rivals Google’s Text-to-Speech service, which is powered by its WaveNet framework and currently offers 57 voice styles in 21 languages. Microsoft Corp. also offers a similar service called the Azure Speech Service API, which offers 75 standard voices and three AI-generated voices.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU