Google develops an AI that can mimic the brain’s ‘cocktail party effect’
Researchers at Google LLC have developed a new approach to applying artificial intelligence that could help enhance many of the search giant’s services, from YouTube to Hangouts.
The breakthrough, which hit the news cycle today following the company’s publication of the details on Wednesday, has to do with a phenomenon known as the cocktail party effect. It pertains to the brain’s ability to focus on a single object, such as a person, in an environment rife with distractions such as other people talking.
AI models often struggle to tune out peripheral input with the same effectiveness, particularly when it comes to audio streams that contain multiple voices. This has proved to be a major challenge in the field of speech recognition, which is among the main applications of neural networks today.
Google said that its researchers have managed to overcome the obstacle by developing a deep learning model that takes into account a different type of information: visual input. Designed to process videos, the AI can analyze the mouth movements of the people shown in a clip to match each individual with the appropriate voice. Once it has made the necessary associations, the model can separate the individual speech tracks.
Teaching the AI to perform the task efficiently was no small feat. Google’s researchers collected 100,000 videos from YouTube, extracted sound segments that each contained the voice of a single speaker and then stitched those segments together into “synthetic cocktail parties” with multiple audio tracks. The team used the dataset to train the model on how to separate a speaker’s voice from other sounds under various conditions.
The result, according to Google, is an AI that enables the user to click on the face of the person that they want to hear and have the other speakers in a video automatically muted. The technology has many potential uses for the search giant.
For starters, Google could implement a version of the AI in YouTube to let users tune out some of the sounds in a clip. This would be a particularly big convenience for videos recorded in noisy environments that may at times make it difficult to hear the speaker.
The AI may also have the potential to enhance the user experience in Hangouts and Meet, Google’s video conferencing services, by making it easier for call participants to focus on a particular person’s voice. The search giant even believes that the technology could have medical applications, such as enabling the development of more sophisticated hearing aids.
Image: Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU