

Nvidia Corp. today unveiled new technologies and artificial intelligence models that help developers more quickly build smarter robots, including humanoids, and self-driving vehicles by automating the complex modeling and data work.
These included Nvidia Isaac GR00T N1, the world’s first open-source, fully customizable generalist foundation model for humanoid robot brains, the GR00T blueprint for generating synthetic data and Newton, an open-source physics engine purpose-built for developing robots. The company also announced a major release of new Nvidia Cosmos world foundation models, with customizable reasoning for physical AI development and world generation.
Physical AI refers to the integration of artificial intelligence into physical systems such as robots and other machines to allow them to perceive and react to the real world. It combines AI algorithms with physical hardware that can sense, act and adapt to changing conditions. Emerging modern-day examples include robotics systems and autonomous vehicles.
“The age of generalist robotics is here,” said co-founder and Chief Executive Jensen Huang. “With Nvidia Isaac GR00T N1 and new data-generation and robot-learning frameworks, robotics developers everywhere will open the next frontier in the age of AI.”
As humanoid robot adoption becomes an increasing trend, developers will be challenged to train AI in tasks and test them at scale. GR00T N1 can easily generalize across common robot tasks – such as grasping, moving objects with one or both arms and transferring items from one arm to another – it can also perform multistep tasks that combine steps.
It’s also designed to be easily post-trained with real or synthetic data for industry-specific skills. In his keynote at this year’s Nvidia GTC, Huang demonstrated a humanoid robot from the startup 1X tidying up using GR00T N1 AI training policies.
Nvidia GR00T N1 training data and task evaluation datasets are available open-source for download from Hugging Face and GitHub.
In collaboration with Google DeepMind, Google LLC’s AI research lab, and Disney Research, Nvidia developed Newton, an open-source physics engine that allows robots to learn how to handle complex tasks with greater precision. Built on the Nvidia Warp framework, DeepMind and Nvidia are collaborating to develop the simulation framework MuJoCo-Warp, which is expected to accelerate robotics development by more than 70x through DeepMind’s MJX open-source library and Newton.
Robotics requires massive amounts of data, but is extremely costly to capture. For humanoids, real-world data capture must be done my demonstration from actual people. To aid with this, Nvidia announced the Isaac GR00T Blueprint for synthetic manipulation motion generation. This blueprint Nvidia generated 780,000 synthetic trajectories – the company said this is equivalent to 6,500 hours, or nine continuous months, of human demonstration data – in just 11 hours.
World foundation models, or WFMs, empower developers and engineers to create virtual training grounds where robots learn to navigate real-world challenges through simulated environments so they can be trained across various scenarios.
As part of today’s announcements, Nvidia announced the additions of a family of new AI models within the company’s Cosmos WFMs, introducing breakthroughs for synthetic data generation, high-fidelity world generation and multimodal reasoning.
“Just as large language models revolutionized generative and agentic AI, Cosmos world foundation models are a breakthrough for physical AI,” said Huang. “Cosmos introduces an open and fully customizable reasoning model for physical AI and unlocks opportunities for step-function advances in robotics and the physical industries.”
Cosmos Transfer WFMs can produce vast amounts of synthetic data for robotic training by ingesting structured video inputs such as segmentation maps, depth maps, lidar scans, pose estimations and trajectories to generate photoreal video outputs. The model streamlines perception AI training by allowing developers to produce large-scale photorealistic videos for large-scale datasets.
“Cosmos offers us an opportunity to scale our photorealistic training data beyond what we can feasibly collect in the real world,” said Pras Velagapudi, chief technology officer of Agility Robotics Inc., an American humanoid robotics company.
Foretellix Ltd., a verification and validation solution for self-driving car developers, uses Cosmos Transfer to create variations of physically based sensor data at scale using the Nvidia Omniverse Blueprint for autonomous vehicle simulation. It can produce numerous different conditions including heavy weather, bad driving conditions, twilight and other road hazards for the driving robot to train with.
Cosmos Predict, announced during the CES trade show in January, can generate virtual worlds from text, images and video. The newest models will allow developers to input a first frame and a last frame of video and predict the frames in-between, generating all of the intermediate actions and motions that took place. It is purpose-built for post-training physical AI models to provide them the skills to understand what happened during the video.
Robotics developers are already using Cosmos Predict and Transfer, including 1X to train its new humanoid robot Neo Gamma and robot brain developer Skild AI Inc. is using Transfer to generate synthetic training data.
Cosmos Reason provides a fully customizable WFM that uses chain-of-thought reasoning to understand video data and predict the outcomes of interactions through natural language processing. For example, it can understand what might happen next from watching a person stepping onto a crosswalk or a box falling from a shelf.
Developers can distill the Reason model to enhance existing world foundation models or create new vision language action models for robotics. It can also be used to improve physical AI data annotation and curation, which can help produce accurate datasets for understanding the world. And it can post-train high-level planners for multi-step processes needed orchestrate physical AI to complete tasks.
THANK YOU