Google’s VideoBERT algorithm predicts the future one cooking video at a time
Google LLC today debuted VideoBERT, an artificial intelligence that can watch part of a video and extrapolate what will happen in the next few seconds like a human.
Equipping a computer with the ability to understand and draw correct conclusions from a visual scene requires an incredibly sophisticated algorithm. For Google’s researchers, however, the challenge wasn’t building the algorithm but finding enough data with which to train it. Machine learning models must ingest enormous amounts of information to understand even basic concepts and that information typically must be prepared by hand.
That wasn’t feasible for VideoBERT, since teaching the model how to predict future events required more sample videos that what Google’s researchers could’ve assembled by hand. They would have additionally had to write descriptions for each individual frame of every clip just so the AI could follow what’s happening. So the team came up with an alternative: freely available instructional videos.
In a video that shows how to cook an omelette or fill a tire, the person demonstrating the task will often explain each step as they perform it, narration that the researchers used as a substitute for the frame-by-frame descriptions they would have had to create for the AI otherwise. The team compiled over a million clips spanning categories such as cooking and gardening. They then fed them to VideoBERT to teach the model how to trace the progress of common activities.
After the training, the model was set loose on a collection of cooking videos it had never seen before. When presented with a video fragment showing a bowl of flour and cocoa powder, VideoBERT astutely predicted that the ingredients will be placed in an oven and become a brownie or a cupcake. The researchers also managed to harness the algorithm’s observation skills to extract a recipe from a video in which a chef explained how to cook a steak.
The methods Google developed to train VideoBERT could eventually find use in far more serious applications. Self-driving cars, for instance, might become safer if they gained the ability to predict accurately where nearby vehicles will be a few seconds into the future. Such foresight can also be a big asset for drones and industrial robots that operate in close proximity to human workers.
Since you’re here …
Show your support for our mission by our 1-click subscribe to our YouTube Channel (below) — The more subscribers we have the more then YouTube’s algorithm promotes our content to users interested in #EnterpriseTech. Thank you.
Support Our Mission: >>>>>> SUBSCRIBE NOW >>>>>> to our Youtube Channel
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.