UPDATED 13:28 EDT / JANUARY 06 2021

AI

OpenAI’s newest AI models draw and recognize objects more efficiently

by Maria Deutscher

Researchers at OpenAI have developed two neural networks that can draw objects based on natural-language user prompts and describe images with a high degree of accuracy.

The projects, detailed Tuesday, expand the range of tasks to which artificial intelligence can be applied. They also advance the AI research community’s goal of creating more versatile models that require less manual fine-tuning by engineers to produce accurate results.

DALL·E, the first new neural network, is a miniaturized version of the GPT-3 natural-language processing model that OpenAI debuted in 2020. GPT-3, one of the most complex neural networks created to date, can generate text and even software code from simple descriptions.DALL·E applies the same capability to drawing images based on user prompts.

The model’s standout capability is that it can produce images even in response to descriptions that it’s encountering for the first time and are normally difficult for an AI to interpret. During testing performed by OpenAI researchers, the model successfully generated drawings in response to descriptions such as “an armchair in the shape of an avocado” and “a snail made of harp.” Moreover, the model is capable of generating images in several different styles.

The researchers decided to test exactly how versatile the AI is by having it tackle several additional tasks of varying difficulty. In one series of experiments, the model demonstrated an ability to generate the same image from multiple angles and with different levels of resolution. Yet another test showed that the model is sophisticated enough to customize individual details of the image it’s asked to generate.

“Simultaneously controlling multiple objects, their attributes, and their spatial relationships presents a new challenge,” OpenAI’s researchers wrote in a blog post. “For example, consider the phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and green pants.” To correctly interpret this sentence, DALL·E must not only correctly compose each piece of apparel with the animal, but also form the associations (hat, red), (gloves, yellow), (shirt, blue), and (pants, green) without mixing them up.”

OpenAI’s other newly detailed neural network, Clip, focuses on recognizing objects in existing images rather than drawing new ones.

There are already computer vision models that classify images in such a manner. However, most of them can identify only a narrow set of objects for which they are specifically trained. An AI that classifies animals in wildlife photos, for example, has to be trained on a large number of wildlife photos to produce accurate results. What sets OpenAI’s Clip apart is that it’s capable of creating a description of an object it hasn’t encountered before.

Clip’s versatility is the fruit of a new training approach the lab has developed to build the model. For the training process, OpenAI used not a manually crafted image dataset but rather images sourced from the public web and their attached text captions. The captions enabled Clip to build a broad lexicon of words associated with different types of objects, associations it could then use to describe objects it hasn’t seen before.

“Deep learning needs a lot of data, and vision models have traditionally been trained on manually labeled datasets that are expensive to construct and only provide supervision for a limited number of predetermined visual concepts,” detailed the researchers behind Clip. “In contrast, CLIP learns from text-image pairs that are already publicly available on the internet.”

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.