UPDATED 16:37 EDT / JANUARY 25 2024

Google researchers develop new diffusion-based AI system for generating videos

Google LLC has revealed Lumiere, an artificial intelligence system for generating videos that the company says can outperform earlier models in the category.

The Alphabet Inc. unit detailed the system in a research paper published on Tuesday. According to Google, Lumiere is capable of generating five-second videos with a resolution of 1024 by 1024 pixels. It can create clips based on a text prompt or image provided by the user, as well as edit existing clips.

Under the hood, Lumiere comprises two separate AI models. The first, which Google’s researchers have named Space-Time U-Net, generates an initial low-resolution clip based on the user’s prompt. The second AI upgrades the resolution of that clip to produce the final 1024-by-1024-pixel version Lumiere provides as output.

According to Google, the Space-Time U-Net model that creates the initial low-resolution video is based on a so-called diffusion architecture. This architecture underpins many of the most popular AI image generators on the market.

What sets diffusion models apart from other neural networks is the way they’re trained. During training, a diffusion model is given a set of photos that contain a type of error known as Gaussian noise. It must then remove the error to recreate the original photos, a process through which it learns how to generate entirely new images from scratch.

Google’s researchers based Space-Time U-Net on an existing, open-source diffusion model. They customized it by adding software components that can decrease and increase the quality of images. They also equipped the model with support for attention, a machine learning technique that allows neural networks to filter the information they consider when making decisions and disregard irrelevant data points.

Space-Time U-Net is not the only AI model capable of generating videos. However, it takes a different approach to the task than earlier neural networks in the category.

A video is a set of images, or frames, displayed one after one another. AI systems typically generate that sequence of frames in two phases. They create the first and last images in the subsequence, then add the remaining frames.

Space-Time U-Net goes about the task differently. Instead of generating a clip’s frames piecemeal, it creates the entire clip in one pass. Google says this approach allows Lumiere to generate higher-fidelity videos than many existing neural networks.

The company evaluated the AI system’s capabilities by having it generate a series of five-second clips based on 113 different prompts. For added measure, it also used a benchmark dataset called UCF101 in the assessment. Researchers determined that Lumiere achieved “state-of-the-art video generation results” compared with other neural networks designed for the same task.

Besides generating video content based on text prompts, Lumiere can also create clips in the style of a reference image uploaded by the user. It’s likewise capable of modifying existing clips. Furthermore, Google says, Lumiere can create animations called cinemagraphs in which only some elements move while the others remain still.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google researchers develop new diffusion-based AI system for generating videos

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

RECENT CUBE EVENTS

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Google researchers develop new diffusion-based AI system for generating videos

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Data Protection & AI Summit

AWS & Ecosystem Leaders Halftime Report - 2025

Black Hat USA 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Cookies