UPDATED 08:00 EDT / OCTOBER 14 2021

AI

With Ego4D, Facebook wants AI to understand the world from a first-person perspective

Facebook Inc. announced today a long-term project aimed at solving research challenges with artificial intelligence and first-person human perception.

The research produced in the project, called Ego4D, would be useful for numerous applications, including augmented reality, virtual reality and robotics. For example, AI capable of understanding human perception from a first-person perspective could provide instructions for technicians, guide people for recipes, assist people in locating lost items and so on.

Facebook AI calls this “egocentric perception,” which differs from what is common in most of today’s computer vision capture systems. They typically learn from photos and video captured from a third-person perspective where the camera is a spectator to action.

“Next-generation AI systems will need to learn from an entirely different kind of data — videos that show the world from the center of the action, rather than the sidelines,” said Kristen Grauman, lead research scientist at Facebook.

In order to conduct this research, Facebook AI recruited a consortium of 13 universities and labs across nine countries. They collected more than 2,200 hours of first-person video in the wild, featuring more than 700 participants going about their daily lives. Most importantly, the AI must be able to provide familiar, in-context assistance for day-to-day activities to be useful, so the data should be captured in that context.

“Equally important as data collection is defining the right research benchmarks or tasks,” Grauman said. “A major milestone for this project has been to distill what it means to have intelligent egocentric perception, where we recall the past, anticipate the future, and interact with people and objects.

Using the Ego4D data set, Facebook AI chose five benchmarks that work with AI applications for episodic memory, forecasting, hand-object interaction, audio-visual memory and social interaction.

Episodic memory examples could include the AI answering questions about personal memory from egocentric video capture, such as finding misplaced keys. Although it’s easy to have forgotten where they might have been mislaid, an AI could quickly scan back through the memory of recorded video and discover them laying on a table (or left in the fridge) using a camera on wearable glasses.

With forecasting, AI could provide helpful guidance during task-oriented activities such as cooking, construction, repair work or other technical jobs. An AI could use the wearer’s camera to understand what’s previously happened and predict what’s likely to happen next. Combined with human-object-interaction, it could identify that users had already added salt to their food and warn them that they’d reached for the salt yet again.

AI could also be used to augment audio-visual memory and social interaction. For example, if someone misses something important during a class because of a distraction, he could ask their assistant for a summary. An AI with social intelligence could understand eye contact and who is talking to whom, so an AI assistant could make it easier to focus during a noisy dinner party.

The objective of Ego4D is to allow AI to gain a deeper understanding of how people go about their day-to-day lives as they normally would so that it can better contextualize and personalize experiences. As a result, AI assistants could have a positive impact on how people live, work and play.

The data sets will be available in November of this year for researchers who sign Ego4D’s data use agreement.

Image: Facebook AI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.