UPDATED 12:00 EDT / JANUARY 20 2022

Meta AI creates algorithm that can learn from speech, text and vision

Meta Platforms Inc.’s artificial intelligence researchers have come up with what they say is the world’s first high-performance, self-supervised algorithm that can train AI in multiple modalities, be it speech, vision or text.

The algorithm is called data2vec. Meta said in a blog post today it will enable the creation of smarter AI that can learn more generally and perform multiple tasks, including those that are unfamiliar.

Meta is trying to solve one of the big constraints of self-supervised learning, which enables machines to learn by directly observing their environment, as opposed to being explicitly taught via labeled images, text or audio. Although self-supervised learning is a big improvement, enabling computers to learn by observing their environment, it remains difficult to scale because of the differences in the way algorithms use images, speech and other modalities.

For example, an algorithm that’s used to read text is trained to fill in the blanks in various random sentences. A speech model, however, needs to learn an inventory of basic sounds in order to predict any missing sounds in a person’s speech. Meanwhile, computer vision models are usually trained to assign similar representations to a color image, of a cow perhaps, and the same image flipped upside down, so it will associate the two more closely than it would with an unrelated image of, say, a dolphin.

AI algorithms also predict different units for each modality. Image recognition involves predicting pixels or visual tokens, while text involves words and speech requires models to predict sounds from a learned inventory.

“This discrepancy has been a significant barrier to applying advances in self-supervised learning more broadly,” Meta’s AI researchers said. “Because a powerful algorithm designed for, say, understanding images can’t be directly applied to another modality, such as text, it is difficult to push several modalities ahead at the same rate.”

data2vec overcomes this by teaching AI models to predict their own representations of the input data regardless of what modality it is. By focusing on those representations instead of the usual words, sounds or visual tokens, data2vec can work with multiple types of input data.

Meta said it tested data2vec on the popular ImageNet computer vision benchmark and found it performed better than any existing algorithm. For speech, it was superior to Meta’s own wav2vec 2.0 self-supervised speech algorithm, while for text it was tested on the GLUE benchmark suite and performed as well as BERT, another commonly used benchmark.

In a Facebook post, Meta Founder and Chief Executive Mark Zuckerberg described data2vec as one of the company’s most exciting breakthroughs in AI so far.

“Meta AI research built a system that learns from speech, vision and text without needing labeled training data,” Zuckerberg said. “People experience the world through a combination of sight, sound and words, and systems like this could one day understand the world the way we do. This will all eventually get built into AR glasses with an AI assistant so, for example, it could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.”

Elaborating, Meta’s researchers said data2vec has great potential to help create a new breed of AI models that can learn by themselves to perform multiple different tasks, including those that are unfamiliar. So an AI would not only be able to recognize animals it has came across in its training data, but also new creatures if it’s told what they look like.

“This paves the way for more general self-supervised learning and brings us closer to a world where AI might use videos, articles, and audio recordings to learn about complicated subjects, such as the game of soccer or different ways to bake bread,” Meta’s AI researchers said.

Meta believes the potential of data2vec is so great that it’s sharing the code and various pre-trained models to the wider AI research community, so others can build on its work.

Image: Meta AI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Meta AI creates algorithm that can learn from speech, text and vision

Image: Meta AI

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Meta AI creates algorithm that can learn from speech, text and vision

Image: Meta AI

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026