Meta pushes envelope on AI training efficiency with new CM3leon image generator
Meta Platforms Inc. today detailed CM3leon, an artificial intelligence model built for image generation tasks that can be trained using only a fraction of the hardware required by similar neural networks.
Meta says CM3leon has other notable features as well. Despite the fact that it was trained using a limited amount of hardware, the AI’s image generation prowess is comparable to that of the most advanced neural networks in its category, it says. Furthermore, CM3leon can perform a broader range of tasks than many competing systems.
Most advanced image generation models are based on a machine learning approach known as diffusion. To create a diffusion-based AI system, researchers assemble a set of images and introduce a type of error called Gaussian noise into each file. They then task an AI model with removing the Gaussian noise, an exercise through which the model learns how to create new images from scratch.
Meta took a different approach with CM3leon. Instead of using the diffusion method, the company’s researchers based the model on the so-called Transformer architecture. That’s a neural network design most commonly used to build large language models such as OpenAI LP’s GPT-4.
CM3leon is not the first transformer-based image generator. However, Meta claims it’s significantly more efficient than other entries into the category. The company says that it trained it using five times less compute infrastructure than earlier AI approaches would have required.
It also stands out in the accuracy department, according to Meta.
Image generation models often struggle to accurately follow the user’s description when drawing objects. Models frequently omit or misunderstand details specified by the input prompt. The more complex the object the user is seeking to draw, the more likely it is errors will emerge.
CM3leon can generate images more accurately than many earlier systems, according to Meta. In a series of tests, the company successfully used the model to draw objects based on complex descriptions such as “small cactus wearing a straw hat and neon sunglasses in the Sahara desert.” Moreover, it set a new record in a popular benchmark used to assess the accuracy of text-to-image models.
“When comparing performance on the most widely used image generation benchmark (zero-shot MS-COCO), CM3Leon achieves an FID (Fréchet Inception Distance) score of 4.88, establishing a new state of the art in text-to-image generation and outperforming Google’s text-to-image model, Parti,” Meta researchers detailed in a blog post.
Many AI image generators can only take text as input. CM3leon, in contrast, is also capable of ingesting images. Users can upload a photo to the model and ask it to make edits, create a caption or answer natural language questions about the pictured objects.
It’s particularly adept at the latter two tasks, according to Meta. The company compared the AI’s captioning and question answering capabilities with two models that were trained on more than ten times as much data. In some tasks, CM3leon either matched or surpassed the performance of the two competing models.
“We believe CM3leon’s strong performance across a variety of tasks is a step toward higher-fidelity image generation and understanding,” Meta’s researchers wrote. “Models like CM3leon could ultimately help boost creativity and better applications in the metaverse.”
Image: Meta
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU