UPDATED 12:35 EST / JUNE 11 2025

AI

Meta releases J-VEPA 2 AI model that understands the world through video

Meta Platforms Inc.’s AI research division today released a new artificial intelligence model that can improve training and AI understanding of the physical world for robots and AI agents through interpreting video information similar to how humans understand the world.

The model, named J-VEPA 2, or Video Joint Embedding Predictive Architecture Model, builds on the company’s previous work on J-VEPA, which allows AI agents and robots to “think before they act.”

“As humans we think that language is very important for intelligence, but in fact that’s not the case,” said Yann LeCun, vice president and chief AI scientist at Meta. “Humans and animals navigate the world by building mental models of reality. What if AI could develop this kind of common sense, an ability to make predictions of what is going to happen in some kind of abstract representation of space?”

Meta said it’s a state-of-the-art AI world model, trained on video that enables robots and other AI models to understand the physical world and predict how it will respond to their actions.

World models allow AI agents and robots to build a concept of the physical world and understand the consequences of actions in order to plan a course of actions to a given task. With a world model, a company or organization does not need to run a million trials with an AI in the real world, because a world model can simulate the world for an AI model — often within minutes — for training with an understanding of how the world works.

A world model can also be used to understand and predict what will happen after a certain action is taken, allowing a robot or AI attached to a sensor to understand the next event that might happen. Humans do this all the time when planning next steps, such as when walking from place to place when avoiding other people in an unfamiliar place or when playing hockey.

An AI model could use this kind of planning to help prevent accidents in the workplace by guiding robots on safe paths with other robots and humans working alongside, reducing potential hazards.

V-JEPA 2 helps AI agents understand the physical world and its interactions by understanding patterns of how people interact with objects, how objects move in the physical world and how objects interact with other objects.

The company said, when the model was deployed on robots in its labs, it found that robots can use J-VEPA 2 to perform tasks such as reaching, picking up an object and placing an object in a new location with ease.

“Of course, world models are essential for autonomous cars and robots,” said LeCun. “In fact, we believe world models will usher in a new era for robotics enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data.”

In addition to the release of J-VEPA 2, Meta released three new benchmarks for the research community to evaluate existing reasoning models that that use video to understand the world.

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.