Google follows Meta in introducing text-to-video AI
Researchers at Google LLC’s AI lab, Google Brain, today unveiled Imagen Video, a program that can create high-quality videos from text, similar to what Meta Platforms Inc. introduced last week.
Google calls Imagen Video a “text-conditional video generation system based on a cascade of video diffusion models.” With just a text prompt, it says, it can generate high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models.”
The generator will produce 1280×768 HD video at 24 frames per second. It’s currently in the development stage, but it’s already quite the step up from Google’s text-to-image generation model DALL-E, which was debuted earlier this year. With that, if you said you wanted to see a still frame of a spaceman riding a horse, you could, and now it seems you can have your astronaut-horse team galloping through space.
To program the video generator, Google let it look at a vast range of videos and still images, each labeled with some text. So, when text is later inputted, the generator replicates the videos and images it has seen in the past as a synthesis of the data. 14 million videos and 60 million still images, as well as 400 million images in the LAION-400M open dataset, were used for the AI’s training. Google showed some examples, such as a panda eating and a teddy bear doing various things.
Google said it realized that there are always dangers in video manipulation technology, such as when people create what has come to be known as deep fakes. Such technology is already a problem, but as systems advance, society may have quite a problem on its hands.
“Video generative models can be used to positively impact society, for example, by amplifying and augmenting human creativity,” the company said. “However, these generative models may also be misused, for example, to generate fake, hateful, explicit or harmful content. We have taken multiple steps to minimize these concerns, for example, in internal trials, we apply input text prompt filtering, and output video content filtering.”
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.