UPDATED 17:16 EST / DECEMBER 16 2024

AI

Google debuts Veo 2 video generator, upgraded Imagen 3 with Whisk remix tool

Google LLC today debuted Veo 2, an artificial intelligence model capable of generating high-resolution videos up to two minutes in length.

The company is rolling out the algorithm alongside two other additions to its AI portfolio. The first is a new version of Imagen 3, Google’s flagship AI image generator, that has received an output quality boost. The search giant also debuted a tool called Whisk that uses Imagen 3 to remix existing images. 

The new Veo 2 model generates videos based on natural language prompts. Users can enter up to several sentences describing what objects a clip should depict, as well as the manner in which those objects should be rendered. It’s optionally possible to enter instructions for specific points in time, such as a video’s ending. 

Veo 2 enables users to customize a clip’s cinematographic settings. The AI can simulate camera features such as a specific type of lens or film roll cartridge. For example, users could instruct Veo 2 to generate a video as if it were shot with a 18-millimeter lens optimized for capturing wide-angle footage.

The model likewise supports cinematic effects. In one example, Google showed Veo 2 generating a video with volumetric lighting. This is a rendering method for generating realistic-looking beams of light.

Veo 2 is the successor to an eponymous AI video generator that Google debuted in May. Compared to its predecessor, the new model produces more realistic and detailed clips with up to 4K resolution, which corresponds to 3840 pixels by 2160 pixels. Google says that Veo 2 is also less prone to hallucinations.

The search giant put the model to the test by evaluating it with MovieGenBench, a benchmarking tool that Meta Platforms Inc. open-sourced earlier this year. As part of the evaluation, 1,003 users compared Veo 2 with several other video generators. The model outperformed the competition, including OpenAI’s newly released Sora Turbo, on “overall preference” and with respect to its ability to accurately follow prompts. 

“It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall,” Google DeepMind research scientist Aäron van den Oord and Elias Roman, senior director of product management for Google Labs, detailed in a blog post

At the same time, Veo 2 has certain limitations. Google detailed that the model struggles to keep frames consistent with one another “throughout complex scenes or those with complex motion.”

The company plans to integrate Veo 2 into several products including YouTube Shorts and Vertex AI, Google Cloud’s AI development toolkit. Initially, the model will be available in Google Labs, a service that provides early access to the search giant’s newest AI features. Accessibility is tied to a waitlist. 

Veo 2 is rolling out to Google Labs alongside a new version of Imagen 3, the company’s most advanced AI image generator. Compared to the original Imagen 3, it generates brighter images with “richer details and textures. It’s also better at following user prompts.

Imagen 3 powers Whisk, a new service that will likewise be accessible via Google Labs. It allows users to combine multiple existing images into a new one. Whisk can, for example, apply one the style of one image to another’s background. 

Under the hood, Whisk uses not only Imagen 3 but also Google’s Gemini series of large language models. When users upload photos they wish to combine, Gemini generates a detailed caption for each image. Those captions help Imagen 3 determine how to carry out the remixing process. 

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU