UPDATED 17:16 EDT / DECEMBER 16 2024

AI

Google debuts Veo 2 video generator, upgraded Imagen 3 with Whisk remix tool

Google LLC today debuted Veo 2, an artificial intelligence model capable of generating high-resolution videos up to two minutes in length.

The company is rolling out the algorithm alongside two other additions to its AI portfolio. The first is a new version of Imagen 3, Google’s flagship AI image generator, that has received an output quality boost. The search giant also debuted a tool called Whisk that uses Imagen 3 to remix existing images. 

The new Veo 2 model generates videos based on natural language prompts. Users can enter up to several sentences describing what objects a clip should depict, as well as the manner in which those objects should be rendered. It’s optionally possible to enter instructions for specific points in time, such as a video’s ending. 

Veo 2 enables users to customize a clip’s cinematographic settings. The AI can simulate camera features such as a specific type of lens or film roll cartridge. For example, users could instruct Veo 2 to generate a video as if it were shot with a 18-millimeter lens optimized for capturing wide-angle footage.

The model likewise supports cinematic effects. In one example, Google showed Veo 2 generating a video with volumetric lighting. This is a rendering method for generating realistic-looking beams of light.

Veo 2 is the successor to an eponymous AI video generator that Google debuted in May. Compared to its predecessor, the new model produces more realistic and detailed clips with up to 4K resolution, which corresponds to 3840 pixels by 2160 pixels. Google says that Veo 2 is also less prone to hallucinations.

The search giant put the model to the test by evaluating it with MovieGenBench, a benchmarking tool that Meta Platforms Inc. open-sourced earlier this year. As part of the evaluation, 1,003 users compared Veo 2 with several other video generators. The model outperformed the competition, including OpenAI’s newly released Sora Turbo, on “overall preference” and with respect to its ability to accurately follow prompts. 

“It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall,” Google DeepMind research scientist Aäron van den Oord and Elias Roman, senior director of product management for Google Labs, detailed in a blog post

At the same time, Veo 2 has certain limitations. Google detailed that the model struggles to keep frames consistent with one another “throughout complex scenes or those with complex motion.”

The company plans to integrate Veo 2 into several products including YouTube Shorts and Vertex AI, Google Cloud’s AI development toolkit. Initially, the model will be available in Google Labs, a service that provides early access to the search giant’s newest AI features. Accessibility is tied to a waitlist. 

Veo 2 is rolling out to Google Labs alongside a new version of Imagen 3, the company’s most advanced AI image generator. Compared to the original Imagen 3, it generates brighter images with “richer details and textures. It’s also better at following user prompts.

Imagen 3 powers Whisk, a new service that will likewise be accessible via Google Labs. It allows users to combine multiple existing images into a new one. Whisk can, for example, apply one the style of one image to another’s background. 

Under the hood, Whisk uses not only Imagen 3 but also Google’s Gemini series of large language models. When users upload photos they wish to combine, Gemini generates a detailed caption for each image. Those captions help Imagen 3 determine how to carry out the remixing process. 

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.