UPDATED 20:34 EDT / NOVEMBER 22 2016

NEWS

Google’s DeepMind learns to lip-read better than humans

Google may have found a way to use machine learning technology to help millions of deaf and hearing-impaired people better understand what people are saying to them.

Researchers from Google Inc.’s DeepMind artificial intelligence project, which built the boardgame-playing AlphaGo that managed to successfully defeat one of the top Go players in the world, have teamed up with peers at the Oxford University to create an AI system that’s able to outperform professional lip-readers after training itself on thousands of hours of BBC videos.

New Scientist reports that in tests, a human lip-reader who provides services for the U.K. courts was able to correctly decipher only about a quarter of words spoken when shown a random sample from 200 BBC video broadcasts. However, DeepMind’s AI system was able to decipher almost half of the words from the same sample videos. In addition, the AI was able to annotate 46 percent of the words without error, compared with just 12 percent by the human lip-reader.

The researchers hope that the technology could one day be used on phones, either as a new way to instruct a voice assistant like Siri, or as a way to enhance speech recognition.

“A machine that can lip-read opens up a host of applications: ‘dictating’ instructions or messages to a phone in a noisy environment; transcribing and redubbing archival silent films; resolving multi-talker simultaneous speech; and improving the performance of automated speech recognition in general,” the researchers said in their research paper.

Machine learning involves using massive data sets to train AI systems. In this case, the researchers trained their lip-reading system, called “Watch, Listen, Attend and Spell,” on almost 5,000 hours of talking faces from six BBC shows, such as BBC Breakfast, Newsnight and Question Time. The system was fed 118,000 sentences and 17,500 unique words in total.

The researcher explained that unlike other lip-reading systems, theirs was focused on interpreting “unconstrained natural language sentences” and “in-the-wild videos.” Previous systems, such as the University of Oxford’s LipNet, have targeted recognition only on a much more limited number of words and phrases.

DeepMind and the Oxford University say they’ll make their data publicly available as a training resource for other researchers and projects.

Image credit: Google/Oxford University

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU