OpenAI’s newest AI models draw and recognize objects more efficiently
Researchers at OpenAI have developed two neural networks that can draw objects based on natural-language user prompts and describe images with a high degree of accuracy.
The projects, detailed Tuesday, expand the range of tasks to which artificial intelligence can be applied. They also advance the AI research community’s goal of creating more versatile models that require less manual fine-tuning by engineers to produce accurate results.
DALL·E, the first new neural network, is a miniaturized version of the GPT-3 natural-language processing model that OpenAI debuted in 2020. GPT-3, one of the most complex neural networks created to date, can generate text and even software code from simple descriptions.DALL·E applies the same capability to drawing images based on user prompts.
The model’s standout capability is that it can produce images even in response to descriptions that it’s encountering for the first time and are normally difficult for an AI to interpret. During testing performed by OpenAI researchers, the model successfully generated drawings in response to descriptions such as “an armchair in the shape of an avocado” and “a snail made of harp.” Moreover, the model is capable of generating images in several different styles.
The researchers decided to test exactly how versatile the AI is by having it tackle several additional tasks of varying difficulty. In one series of experiments, the model demonstrated an ability to generate the same image from multiple angles and with different levels of resolution. Yet another test showed that the model is sophisticated enough to customize individual details of the image it’s asked to generate.
“Simultaneously controlling multiple objects, their attributes, and their spatial relationships presents a new challenge,” OpenAI’s researchers wrote in a blog post. “For example, consider the phrase “a hedgehog wearing a red hat, yellow gloves, blue shirt, and green pants.” To correctly interpret this sentence, DALL·E must not only correctly compose each piece of apparel with the animal, but also form the associations (hat, red), (gloves, yellow), (shirt, blue), and (pants, green) without mixing them up.”
OpenAI’s other newly detailed neural network, Clip, focuses on recognizing objects in existing images rather than drawing new ones.
There are already computer vision models that classify images in such a manner. However, most of them can identify only a narrow set of objects for which they are specifically trained. An AI that classifies animals in wildlife photos, for example, has to be trained on a large number of wildlife photos to produce accurate results. What sets OpenAI’s Clip apart is that it’s capable of creating a description of an object it hasn’t encountered before.
Clip’s versatility is the fruit of a new training approach the lab has developed to build the model. For the training process, OpenAI used not a manually crafted image dataset but rather images sourced from the public web and their attached text captions. The captions enabled Clip to build a broad lexicon of words associated with different types of objects, associations it could then use to describe objects it hasn’t seen before.
“Deep learning needs a lot of data, and vision models have traditionally been trained on manually labeled datasets that are expensive to construct and only provide supervision for a limited number of predetermined visual concepts,” detailed the researchers behind Clip. “In contrast, CLIP learns from text-image pairs that are already publicly available on the internet.”
Image: OpenAI
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU