UPDATED 19:41 EDT / APRIL 02 2020

CLOUD

Microsoft’s Azure Cognitive Services gets new voice styles

Microsoft Corp. today added new “voice styles” to Azure Cognitive Services, its cloud-based suite of application programming interfaces and software development kits that developers use to create apps with intelligent voice capabilities.

The new styles — newscast, customer service and digital assistant – are designed to help developers tailor the voice of their apps and services to fit their brand or unique scenario. The voices are said to deliver natural-sounding speech and also match the intonations and patterns of real human voices, the company added.

“Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive,” Microsoft said in a blog post. “Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices.”

The first of the voices, newscast, is intended to reflect the kind of “professional tone” that we associate with TV news reporters, with no trace of regionalism. The voice has a neutral pronunciation, wherein no sounds of letters are dropped.

Microsoft said the newscast voice is also being made available in its Microsoft Listening Docs for WeChat service, which can read aloud documents in Word, Excel and PowerPoint. The voice is also being featured in the Bing mobile app for those who want their daily news briefs read out aloud.

The customer service style voice is for developers who create customer service apps and features a “friendly” and “engaging” tone, Microsoft said. As for the digital assistant voice, it has a “helpful” tone and is suited for tasks such as relaying a weather forecast or navigation directions.

Microsoft has also added new “emotion styles,” which are used to express different emotions to fit a certain context. Emotions include cheerfulness and empathy in English and Brazilian Portuguese, and a “lyrical style,” optimized for reading prose and poetry, that’s only available in Chinese.

Constellation Research Inc. analyst Holger Mueller told SiliconANGLE that such voice capabilities are important, because voice is the new user interface and helps free people from the need to read information.

“Neural networks help to make these traditional robotically and mechanical voices sound more natural by picking up breaks, pitch and intonation,” Mueller said.

Azure Cognitive Services is rivaled by Google LLC’s WaveNet system, which offers a total of 57 different voice styles, including 31 AI-synthesized voices and 24 standard ones. Amazon Web Services Inc. also offers a service called Brand Voice that relies on AI to create custom spokespersons with a range of voice and emotion styles provided by its text-to-speech offering Amazon Polly.

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU