A toddler knows not to stack bigger blocks on smaller ones. A robot? Not so much. At least until now.
Traditional AI models excel at processing text and digital data, but struggle with basic physics that children grasp naturally. NVIDIA aims to change that with NVIDIA Cosmos, a new platform announced at CES 2025 that teaches machines how the physical world works.
The technology centers on "world models," AI systems that form internal representations of structure, dynamics and causal relationships. These models could transform how robots and autonomous vehicles navigate real-world environments and help in areas such as weather prediction and medicine.
"World models fundamentally change how systems perceive and interact with their environments," says Juan Bernabé-Moreno, Director of IBM Research in Europe for Ireland and the UK. "Rather than simply mapping inputs to outputs, these models form internal representations that capture structure, dynamics and causal relationships. It enables the handling of unstructured data more fluidly, adapting to unseen conditions and making inferences based on fewer direct examples or instructions."
The Cosmos platform includes foundation models that can generate physics-based simulations for training AI systems, along with advanced tools that NVIDIA says can process and label 20 million hours of video in just two weeks using its Blackwell platform—a task that would take over three years with traditional CPU processing.
While other AI models generate text or images, Cosmos focuses on physics-based interactions in industrial and driving environments. Developers can customize the system with their data, like footage from warehouse robots or autonomous test drives. The platform has already attracted partners like Uber, which sees it as a potential fast track to autonomous vehicles.
NVIDIA is releasing the models under an open license through platforms like Hugging Face. CEO Jensen Huang calls it a potential "ChatGPT moment" for robotics, suggesting world foundation models could democratize physical AI, much like how large language models (LLMs) transformed text generation.
Armand Ruiz, an VP of Product at IBM Software focused on AI platforms, weighed in on the Cosmos project in a LinkedIn post, calling the robot-training system a "technical masterpiece.”. The open-source system, trained on 20 million hours of real-world footage, represents Nvidia's attempt to create foundation models for robotic movement and interaction.
"The best is the project is Open Source!" Ruiz wrote, noting that Cosmos can simulate scenarios like boxes falling in warehouses and allows companies to customize training with their own data. The system works with NVIDIA's Isaac simulation platform, though its real-world performance remains to be tested.
IBM researchers used this concept in weather forecasting through their Prithvi-Climate-and-Weather foundation model. "It learned the physical dynamics of global processes of the atmospheric system," Moreno says. "It could be used for generating physical-compliant simulations and multi-granular forecasting tasks, as well as downscaling to multiple resolutions."
Three companies have jumped into the sandbox: Uber, robot maker Figure AI, and autonomous vehicle developer Waabi have signed on to implement the technology. The platform comes with an open model license for customization.
Meta's chief AI scientist, Yann LeCun, has explained that a world model is a system that observes its environment and predicts what might happen next, considering its current knowledge and unknown factors that could affect future outcomes. He notes that current AI language models use a simpler version of this approach: they only look at past information to make predictions, without accounting for different possible actions or unknown variables.
The world model’s ability to simulate scenarios before real-world implementation can save enterprises both money and mishaps in robotics.
"World models allow machines to plan movements and interactions in simulated spaces, often called 'digital twins,' before attempting them in the physical world," Moreno says. "This dramatically reduces costly trial –and error, mitigates safety risks and accelerates learning for tasks such as industrial assembly, warehouse logistics or service-oriented robotics."
Moreno points out that these same simulation principles have also caught the attention of medical researchers, who spotted opportunities in drug development and disease treatment.
"In healthcare, world models unify data from multiple domains—genomic, proteomic, transcriptomic and chemical—to capture the complexities of biological systems at scale," Moreno says. "This holistic view empowers researchers and clinicians to uncover hidden patterns in large biomedical datasets, enabling tasks like gene perturbation prediction, disease state classification and therapy-response modeling."
However, achieving these ambitious healthcare applications requires extraordinary computing resources. Training these models demands massive processing power and data resources, even with specialized hardware. The first batch of Cosmos models hits NVIDIA's API catalog this year, alongside tools for processing video data.
The investment in computing muscle could open new doors across industries. Through AI world models, organizations can create virtual twins of their operations to test significant changes before implementation safely. These sophisticated simulations can allow companies to experiment with different setups—whether they're planning a new warehouse layout, or adding robots to their workflow—without disrupting their real-world business.
"Traditional gen AI approaches typically operate on textual or purely digital data, lacking the capacity to reason about physical objects and forces," Moreno says. "By encoding the rules that govern real-world interactions, world models can simulate and predict outcomes beyond text or images."