

A week after Facebook Inc. introduced two datasets aimed at helping developers train their computer vision models, Google LLC has upped the ante with a contribution of its own.
The company on Thursday released AVA, a vast collection of video content that can be used to hone an artificial intelligence’s observation skills. Both Google and Facebook have picked clips that portray everyday actions such as walking. At first glance, it’s a rather specific area to be focusing on given the many other areas in which computer vision is being applied these days. What makes the datasets significant is that the ability to interpret human actions is a key requirement for the some of the most cutting-edge applications of AI.
Google created AVA to address the lack of computer vision datasets that feature complex scenes with multiple people performing different activities. As part of the project, the search giant’s researchers extracted 15-minute videos from long-form content on YouTube, mainly old films, and split each up into 300 three-second segments. Those short clips were in turn manually tagged with labels describing the actions shown on the screen.
AVA contains a total of 57,600 segments with some 210,000 action labels, according to Google. The breadth of the dataset could be useful for helping computer vision models pick up on the differences that exist in how people perform a given action. Because of the variety that exists in human behavior, activities historically have been harder to categorize than objects.
AVA might also help AI systems learn how to detect certain patterns better. For example, Google’s researchers noted in a blog post that the dataset shows actors who sing during a scene often also play an instrument while they’re at it. The fact that the individual segments form continuous 15-minute videos could potentially let computer vision models look for much deeper patterns as well.
Enabling AI to better identify human actions could prove useful in a variety of areas. A drone maker, for example, may benefit from the ability to customize flight patterns based on what users are doing. The technology has the potential to be even more valuable in industrial environments such as factories where robots operate alongside human workers.
Support our open free content by sharing and engaging with our content and community.
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.