UPDATED 11:00 EDT / MAY 05 2026

A collage of images display dual-armed robotic arms at work AI

Ai2 releases MolmoAct 2, enhancing robot intelligence in the real world

Seattle-based artificial intelligence research institute Ai2, the Allen Institute for AI, today announced its next-generation open-source foundation artificial intelligence models, aimed at enabling robots to operate in the real world, with MolmoAct 2. 

Last August, the company released its first iteration, MolmoAct, the company’s first action reasoning model, a new class of AI models that allows machines to reason about 3D environments before they act. Ai2 said the release of MolmoAct 2 substantially outperforms proprietary robotics models on the market and handles various real-world tasks up to 37 times faster than its predecessor. 

In addition to MolmoAct 2, Ai2 released a vast dataset named MolmoAct 2-Bimanual YAM, developed to be the largest open-source manual data source for “two-armed” demonstrations ever published, containing more than 720 hours of training included. 

The company said MolmoAct was trained on 22 hours of curated in-house data over three months. It became the foundation to prove that open, reasoning-based architectures could beat much larger closed models on industry-standard benchmarks. MolmoAct 2 continues that legacy and is built to work in real-world environments. 

To create MolmoAct 2, the company rebuilt the entire architecture from the ground up. Ai2 didn’t simply extend Molmo 2, the company’s video understanding AI model, but designed it based on Molmo 2-ER, a specialized embodied reasoning variant of the company’s foundation model. It was trained on more than 3 million examples of image-based pointing, object detection, abstract spatial reasoning, multi-image reasoning and image- and video-based spatial question answering. 

That allows the new model to pair a dedicated action expert within itself to generate robotic actions through 3D reasoning. The company said that the creation of the MolmoAct 2-Bimanual YAM dataset became foundational to this process. Bimanual refers to the presence of two robotic arms working together in coordinated tasks, such as folding towels, scanning groceries, charging a smartphone or clearing a table.  

With more than 700 hours of example data, it is the largest dataset present in the industry. The company said it supplemented the dataset with an additional mix of robot datasets that allow MolmoAct 2 exposure to different arms, camera setups, control schemes and task styles. 

The researchers also improved the language side of the robot data by making the instructions more diverse by reducing repetitions and low-quality annotations. To do this, they re-annotated the robot library and increased the number of unique labels from 71,000 to about 146,000. 

MolmoAct 2 in the real world

The real test of robotics AI models is testing them in the real world. To see how Molmo Act 2 will behave in actual environments, Ai2 piloted with researchers at the Cong Lab at the Stanford University School of Medicine, led by Professor Le Cong, where the lab is working on wetlabs involving genetics. 

Cong Lab covers CRISPR, which is a phenomenal mechanism for gene editing, but the process involves a great deal of benchtop work, moving between stations, pipetting samples and operating equipment with high precision. According to researchers, errors can accumulate quickly, which can rapidly overwhelm and destroy entire testing runs if the robot gets off track.  

After testing a range of generalist AI model fine-tuned for the workflow, the Stanford team found that Molmo Act 2 showed strong potential to assist with the wetlab operations. 

The company also said it stress tested how Molmo Act 2 handled rephrased instructions, shifted object positions, distractor objects and object substitutions. These tests help AI2’s researchers better understand how the model handles changing conditions. 

According to Ai2, the new model shows great potential but still has limitations. Similar to other robot systems, it can struggle when the gripper blocks the camera’s view, when the arm cannot move as quickly as the robot control system, or when a task requires finer-grained manipulation than is available.  

The company said addressing and overcoming these challenges will help build a shared foundation that will help the entire field tackle these problems for all AI robotics models. Open models will allow researches to inspect, with the addition of datasets they can build on, and soon the company said it will release training code that can be adapted to new machines and situations. 

Image: Ai2

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.