UPDATED 13:29 EST / MAY 24 2022

AI

Google details new cutting-edge image generation AI

Google LLC today detailed Imagen, an artificial intelligence system that can automatically generate images based on text prompts provided by a user.

Over the past few years, researchers have developed multiple neural networks capable of automatically generating images. One of the most sophisticated entries into the category is an AI system called DALL-E 2 that was detailed by OpenAI LLC earlier this year. According to Google, its newly announced Imagen system can outperform DALL-E 2 as well as other AI models in the category.

Imagen includes two separate neural networks. The first takes as input a piece of text that describes what image should be drawn. The neural network turns this description into a form that can be understood by Imagen’s second neural network, which is responsible for drawing the image.

To build Imagen, Google drew on a number of key advances in AI research that were made over the past decade. 

The first neural network in Imagen, which is responsible for translating a text description into a form that the system can understand, is a so-called Transformer model. Transformer models are a type of natural language processing algorithm that was invented by Google in 2017. They can understand the meaning of text more accurately than earlier algorithms.

A Transformer model relies on context to understand the meaning of the words in a sentence. It analyzes the text that surrounds a word, determines which specific pieces of text influence the word’s meaning the most and uses them to make a decision. Google’s new Imagen system uses a Transformer model to turn an image description provided by a user into an embedding, a mathematical representation of data that neural networks can understand.

After the image description is turned into an embedding, a second AI integrated into Imagen uses it to draw the corresponding image. This second AI is a so-called diffusion model, a type of neural network that was first developed in 2015. 

Such neural networks differ from other image generation algorithms in the way they are trained. To train a diffusion model, engineers first supply it with images that contain a type of error known as Gaussian noise. Then, the diffusion model is given the task of finding a way to remove the Gaussian noise. 

AI researchers commonly use a dataset called COCO to compare the effectiveness of image generation algorithms. Google says that Imagen significantly outperformed competing AI systems, including OpenAI’s cutting-edge DALL-E 2 system, in an internal test that used COCO. Imagen also managed to outperform the competition in a separate test based on DrawBench, a new benchmark developed by Google.

Google’s announcement of Imagen comes a few weeks after the search giant debuted PaLM, another cutting-edge AI developed by its researchers. It’s designed for natural language processing tasks and features 540 billion parameters, the configuration settings that help determine how a neural network makes decisions. According to Google, PaLM can outperform OpenAI’s sophisticated GPT-3 neural network when performing certain tasks.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU