UPDATED 15:57 EST / FEBRUARY 07 2024

AI

Apple, UC Santa Barbara researchers detail new MGIE image editing AI

A group of researchers from Apple Inc. and the University of California at Santa Barbara has detailed MGIE, an artificial intelligence system that allows users to edit images with natural language commands.

The researchers first revealed the project in an academic paper released last September. They published a newer, revised version of the paper on Monday. VentureBeat reported today that MGIE has been selected to become the focus of a presentation at the International Conference on Learning Representations, a prestigious AI research event.

There are many AI models on the market that provide the ability to edit images with natural language instructions. To carry out an edit reliably, such models require the user to provide a detailed description of the changes to be performed. In practice, however, users often enter only brief instructions, which limits the usefulness of AI-powered image editing tools.

The newly detailed MGIE system aims to address that limitation. According to the Apple and UC Santa Barbara researchers who developed it, the software can reliably edit an image even if the user describes the changes to be made in only a few words. MGIE achieves that reliability by combining a standard image editing AI with a large language model.

The researchers provided several examples of how the system can be used in their paper. During one test, they input a photo of a pizza into MGIE together with the instruction “make it more healthy.” In response, the system outputted an edited version of the photo that depicts a pizza with more vegetable toppings.

MGIE can not only add objects to an image but also remove existing ones. Moreover, the system provides the ability to carry out broader edits that affect the entire photo rather than only certain sections. A user could, for example, ask MGIE to change an image’s brightness or increase the level of detail.

The system owes its ability to make edits based on brief, often ambiguous user commands to the fact it includes a built-in large language model. That model takes the user’s short prompt and turns it into a much more detailed set of instructions. Those instructions are then inputted into a second neural network that carries out the requested photo edits.

In the test that saw the researchers use MGIE to edit a photo of a pizza, the built-in LLM rewrote the prompt “make it more healthy” to “the pizza includes vegetable toppings, such as tomatoes and herbs.” The system takes a similar approach when a user asks it to perform other types of changes such as background modifications.

MGIE turns the automatically rewritten prompts into an image using a second built-in neural network. That neural network is a diffusion model, a type of AI particularly well-suited for image editing and generation tasks. MGIE uses an algorithm from the popular Stable Diffusion series of open-source diffusion models.

The researchers trained the system on a dataset called IPr2Pr that was released last year. It comprises more than 1 million examples of AI-generated images and the prompts that were used to create them. Training was carried out using the PyTorch AI development framework on eight of Nvidia Corp.’s high-end A100 graphics cards.

After completing the development process, the researchers evaluated MGIE’s capabilities using a combination of automated benchmarks and manual reviews. They determined that the system “significantly strengthens” the quality of image edits compared with traditional AI models. 

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU