

Google may have found a way to use machine learning technology to help millions of deaf and hearing-impaired people better understand what people are saying to them.
Researchers from Google Inc.’s DeepMind artificial intelligence project, which built the boardgame-playing AlphaGo that managed to successfully defeat one of the top Go players in the world, have teamed up with peers at the Oxford University to create an AI system that’s able to outperform professional lip-readers after training itself on thousands of hours of BBC videos.
New Scientist reports that in tests, a human lip-reader who provides services for the U.K. courts was able to correctly decipher only about a quarter of words spoken when shown a random sample from 200 BBC video broadcasts. However, DeepMind’s AI system was able to decipher almost half of the words from the same sample videos. In addition, the AI was able to annotate 46 percent of the words without error, compared with just 12 percent by the human lip-reader.
The researchers hope that the technology could one day be used on phones, either as a new way to instruct a voice assistant like Siri, or as a way to enhance speech recognition.
“A machine that can lip-read opens up a host of applications: ‘dictating’ instructions or messages to a phone in a noisy environment; transcribing and redubbing archival silent films; resolving multi-talker simultaneous speech; and improving the performance of automated speech recognition in general,” the researchers said in their research paper.
Machine learning involves using massive data sets to train AI systems. In this case, the researchers trained their lip-reading system, called “Watch, Listen, Attend and Spell,” on almost 5,000 hours of talking faces from six BBC shows, such as BBC Breakfast, Newsnight and Question Time. The system was fed 118,000 sentences and 17,500 unique words in total.
The researcher explained that unlike other lip-reading systems, theirs was focused on interpreting “unconstrained natural language sentences” and “in-the-wild videos.” Previous systems, such as the University of Oxford’s LipNet, have targeted recognition only on a much more limited number of words and phrases.
DeepMind and the Oxford University say they’ll make their data publicly available as a training resource for other researchers and projects.
THANK YOU