Voice recognition now faster and more accurate than typing, says Stanford study
If you have ever had to type out a lengthy message on your phone, chances are you got frustrated with how long the process took and how many mistakes you had to fix along the way. But it is still better than dealing with a hit-or-miss voice recognition program, right?
Not according to a new study conducted by researchers at Stanford University, the University of Washington, and Baidu Inc., which found that not only is speech-to-text now three times faster than typing on mobile, it is also more accurate.
Researchers at Stanford tested the effectiveness of voice recognition versus typing on a mobile device by setting up a competition between Baidu’s Deep Speech 2 software and iOS’s built-in QWERTY English and Mandarin (Hanyu Pinyin) keyboards. The study found that voice recognition software is now three times faster for English and 2.8 times faster for Mandarin than typing on a mobile screen. Even more surprising, mobile voice software is also 20.4 percent more accurate than typing in English and an astounding 63.4 percent more accurate than typing in Mandarin.
James Landay, professor of computer science at Stanford, credits advancements in deep learning and big data with rapid improvements in voice recognition programs over the last few years.
“For 40 years we’ve been promised great speech recognition, but it really hasn’t worked well enough in terms of error rate and speed to use in real applications,” Landay said in a video produced by Stanford. “But we noticed that over the last two or three years that suddenly speech recognition was working really well due to deep learning and Big Data to train those deep neural networks.”
Speaking to NPR, Baidu chief scientist Andrew Ng said that typing is not a natural way for humans to interact, and that speech recognition is a much better fit for the way we communicate with one another.
“Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone,” Ng said. “Speech has always been a much more natural way for humans to communicate with each other.”
Teaching language to computers
Baidu’s Deep Speech 2 software relies entirely on machine learning to parse human speech, so instead of being manually programmed to understand language, it essentially taught itself by evaluating a wide range of real world data.
Deep Speech 2 is capable of understanding both English and Mandarin, which are the two most-spoken languages in the world. Speaking to the MIT Technology Review last year, Ng noted that thanks to the deep learning algorithms, there was no need to design different features in the speech recognition software for English and Mandarin, despite the fact that the two languages are extremely different in both structure and sound.
While Mandarin has relatively few phonemes (combinations of sounds), it uses changes in pitch to alter the meaning of a word. That means that multiple words can be pronounced exactly the same but are distinguished by how you raise or lower the pitch of your voice when saying them. This is famously demonstrated in poem “Lion-Eating Poet in the Stone Den,” which is composed entirely words that have the same sound but different pitches.
Meanwhile, pitch has no effect on word meaning in English, but there are many words that are pronounced the same and can only be understood by context.
You can watch a video explaining Stanford’s experiment below:
Photo by Jhaymesisviphotography
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU