UPDATED 12:00 EDT / OCTOBER 09 2019

IBM and MIT break new ground in video recognition model training

IBM Corp. has teamed up with researchers from the Massachusetts Institute of Technology to create a new method for training “video recognition” deep learning models more efficiently.

Deep learning is a branch of machine learning that aims to replicate how the human brain solves problems. It has led to major breakthroughs in areas such as language translation and image and voice recognition.

Video recognition is similar to image classification, in that the deep learning model basically tries to identify what’s going on in a video, including the objects and people it sees, what they’re doing and so on. The main difference between the two is that videos have a lot more moving parts than a simple, static image, and so training deep learning models to understand them takes much more time and effort.

“By one estimate, training a video recognition model can take up to 50 times more data and eight times more processing power than training an image classification model,” MIT explained in a blog post today.

Of course, no one likes devoting huge amounts of compute resources to such a task because it can often be prohibitively expensive. Moreover, the resources needed makes it next to impossible to run video recognition models on low-powered mobile devices, where many AI applications are going.

Those problems are what inspired a research team led by Song Han, an assistant professor at MIT’s Department of Electrical Engineering and Computer Science, to come up with a more efficient model for video recognition training. The new technique dramatically reduces the size of video recognition models in order to speed up training times and improve performance on mobile devices.

“Our goal is to make AI accessible to anyone with a low-power device,” Han said. “To do that we need to design efficient AI models that use less energy and can run smoothly on edge devices where so much of AI is moving.”

Image classification models work by looking for patterns in the pixels of an image in order to build up a representation of what they see. With enough examples, the models can learn to recognize people, objects and the ways they relate to one another.

Video recognition works in a similar way, but the deep learning models go further by using “three-dimensional convolutions” to encode the passage of time in a sequence of images (video frames), which leads to bigger and more computationally-intensive models. To reduce the calculations involved, Han and his colleagues designed an operation they call a “temporal shift module” which shifts the feature maps of a selected video frame to its neighboring frames. By mingling spatial representations of the past, present and future, the model gets a sense of time passing without explicitly representing it.

The new technique resulted in a model that can be trained three times faster than existing models on the Something-Something video dataset, which is a collection of densely labeled video clips that show humans performing predefined basic actions with everyday objects.

The model can even understand people’s movements in real time and is also extremely power-efficient. For example, it enabled a single-board computer rigged to a video camera to instantly classify hand gestures, using the same amount of energy required to power a bike light.

Machine Learning is still in its early phases and so are the gains that can be achieved with innovative approaches such as this, said Holger Mueller, principal analyst and vice president at Constellation Research Inc. “Today it is the turn of MIT and IBM to accelerate video recognition, which happens to be one of the hardest ML jobs there is.”

IBM and MIT say their new video recognition model could have useful applications in a variety of fields. For example, it could be used to help catalog videos on YouTube or a similar service more quickly. It could also enable hospitals to run AI applications locally instead of in the cloud, helping to keep confidential data more secure.

Image: mohamed_hassan/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

IBM and MIT break new ground in video recognition model training

Image: mohamed_hassan/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

IBM and MIT break new ground in video recognition model training

Image: mohamed_hassan/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies