UPDATED 13:15 EDT / JUNE 24 2022

Meta is building better AI-driven audio for virtual reality

When it comes to virtual reality, creating immersive worlds is more than just generating visually perfect environments. The way that sound works can make or break an experience.

To tackle the audio challenge, researchers at Meta Platforms Inc. today open-sourced three artificial intelligence models that take sound in the metaverse to a new level.

“Getting spatial audio right is key to delivering a realistic sense of presence in the metaverse,” said Mark Zuckerberg, founder and chief executive of Meta. “If you’re at a concert, or just talking with friends around a virtual table, a realistic sense of where sound is coming from makes you feel like you’re actually there.”

Things sound different across different environments. For example, everyone has the experience of singing in an enclosed space such as the shower or talking in the park. The experience is entirely different. There’s also the way that friends’ voices reflect off walls in the living room of a house or the low murmur in a restaurant.

This is the essence of the first model, called the Visual Acoustic Matching model, which uses an image of the space to adjust sounds so that they match the target environment. For example, it could take an audio clip of a person speaking in an open field and match it to someplace cozy and intimate, making the voice sound closer and echo off nearby walls.

“Human listeners, without us even realizing it, are expecting to hear sounds in a certain way depending on the physical environment that we’re in,” said Kristen Garuman, research director at Meta AI. “That’s because audio is shaped by the environment we’re in.”

This could be useful for meetings with friends in the metaverse because although when we don VR headsets, we might get whisked away to a forest campsite to talk with our friends, we don’t actually leave our living room or home office. The recordings of our voices still keep the sounds that are generated by the spaces that we’re in, so the AI model can change that sound to match the gloaming-lit forest we’re in and make it that much more immersive.

The next model does the opposite. It takes knowledge of the environment and takes away echoes that might be generated by surfaces sound could bounce off, called reverberations, in order to create cleaner, crisper sound. The Visually Informed Dereverberation model could be used to take a violinist’s performance in a massive train station and turn it into something that sounded like it was played in a studio.

The result is potentially better audio in general for recording from headsets worn in homes and home offices for speech enhancement, speaker identification and speech recognition purposes. With less echo sneaking into the audio, smart agents – and even people listening on the other end – would have a better time understanding speech.

Finally, in the metaverse things will probably get a little bit noisy when lots of people are talking nearby, potentially over one another. Visual Voice takes a page from humans, who can listen with more than just their ears – they also use their eyes for clues in mouth movements and facial expressions.

The objective of VisualVoice is to disentangle individual voices from background noises and other voices that might be speaking at the same time and identify individual speakers. The result is that the AI model can provide better accessibility and potentially even create subtitles that attach to those speakers. It could even be used for smart agents to focus on and identify individuals in crowds.

With these new AI models, Meta hopes to supply superior audio to immersive AR and VR experiences in the future. Virtual reality is already providing profound experiences with visual representations of spaces, so it’s important that the quality of the sound keeps up with it.

Garuman sees a future where this AI audio research will provide truly unique experiences for people in the metaverse, such as visiting a concert.

“As soon as you put on your headset the sounds from your home would fade away and the audio would adjust realistically as you move from the hallway into the concert hall and closer to the stage,” she said. “And, if you wanted, AI could enhance the experience so that you could enjoy the experience and still hear your friend next to you.”

Image: Meta

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Meta is building better AI-driven audio for virtual reality

Image: Meta

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

Meta is building better AI-driven audio for virtual reality

Image: Meta

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

Cookies