UPDATED 19:11 EST / MAY 25 2017


Baidu’s text-to-speech AI can replicate hundreds of accents

Virtual assistants give our smart devices seemingly human personalities, but despite their efforts to sound like real people, programs such as Apple Inc.’s Siri and Microsoft Corp.’s Cortana still sometimes sound a little robotic. Chinese web giant Baidu Inc. aims to change that with Deep Voice, an artificial intelligence designed to convert text into believable human speech.

Today, Baidu announced the release of Deep Voice 2, the second iteration of its text-to-speech AI that uses deep learning to accurately replicate human speech. According to the company, in just three months its AI has rapidly expanded from generating only 20 hours of speech in one voice to generating hundreds of hours of speech using hundreds of different synthetic voices.

Baidu said that unlike similar TTS neural nets, Deep Voice 2 generates speech in real time, “as fast as it needs to be played.” The company also boasted that the AI can learn from relatively short recordings of many different voice sources. In a paper outlining the methodology behind Deep Voice 2, Baidu explained that this is a major breakthrough for TTS technology.

“Most TTS systems are built with a single speaker voice, and multiple speaker voices are provided by having distinct speech databases or model parameters,” the company said in its paper. “As a result, developing a TTS system with support for multiple voices requires much more data and development effort than a system which only supports a single voice.”

Baidu claims that Deep Voice 2 is 400 times faster than other TTS systems such as Google Inc.’s WaveNet, and the company believes that its AI could offer a powerful solution to improving interactive media and conversational interfaces. Deep Voice 2’s ability to replicate accents could be especially valuable for companies looking to roll out voice interfaces to multiple regions, as it would simplify the process of localizing the device’s speech. The AI could also allow users to swap out the voices used in their apps, giving their smart devices more customizable personalities.

You can listen to several samples of speech from Deep Voice 2 on Baidu’s website, which show a small portion of the many voices the AI can use.

Deep Voice 2 is one of many AI projects that Baidu has in the works. The company also developed a speech-to-text program called Deep Speech 2, which it used to launch its own “voice first” keyboard app last year. In September, the company announced a partnership with chip maker Nvidia Corp. to provide cloud-updated 3D maps for Nvidia’s self-driving car projects. Then in March, Baidu revealed that it would be opening a second AI research facility in Silicon Valley.

Photo: simone.brunozzi – https://www.flickr.com/photos/simone_brunozzi/4469421200/, CC BY-SA 2.0, Link

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy