World models help AI learn what five-year-olds know about gravity

Aerial view of earth from outerspace

Author

Sascha Brodsky

Staff Writer

IBM

A toddler knows not to stack bigger blocks on smaller ones. A robot? Not so much. At least until now.

Traditional AI models excel at processing text and digital data, but struggle with basic physics that children grasp naturally. NVIDIA aims to change that with NVIDIA Cosmos, a new platform announced at CES 2025 that teaches machines how the physical world works.

The technology centers on "world models," AI systems that form internal representations of structure, dynamics and causal relationships. These models could transform how robots and autonomous vehicles navigate real-world environments and help in areas such as weather prediction and medicine.

"World models fundamentally change how systems perceive and interact with their environments," says Juan Bernabé-Moreno, Director of IBM Research in Europe for Ireland and the UK. "Rather than simply mapping inputs to outputs, these models form internal representations that capture structure, dynamics and causal relationships. It enables the handling of unstructured data more fluidly, adapting to unseen conditions and making inferences based on fewer direct examples or instructions."

Teaching robots to play with blocks

The Cosmos platform includes foundation models that can generate physics-based simulations for training AI systems, along with advanced tools that NVIDIA says can process and label 20 million hours of video in just two weeks using its Blackwell platform—a task that would take over three years with traditional CPU processing.

While other AI models generate text or images, Cosmos focuses on physics-based interactions in industrial and driving environments. Developers can customize the system with their data, like footage from warehouse robots or autonomous test drives. The platform has already attracted partners like Uber, which sees it as a potential fast track to autonomous vehicles.

NVIDIA is releasing the models under an open license through platforms like Hugging Face. CEO Jensen Huang calls it a potential "ChatGPT moment" for robotics, suggesting world foundation models could democratize physical AI, much like how large language models (LLMs) transformed text generation.

Armand Ruiz, an VP of Product at IBM Software focused on AI platforms, weighed in on the Cosmos project in a LinkedIn post, calling the robot-training system a "technical masterpiece.”. The open-source system, trained on 20 million hours of real-world footage, represents Nvidia's attempt to create foundation models for robotic movement and interaction.

"The best is the project is Open Source!" Ruiz wrote, noting that Cosmos can simulate scenarios like boxes falling in warehouses and allows companies to customize training with their own data. The system works with NVIDIA's Isaac simulation platform, though its real-world performance remains to be tested.

IBM researchers used this concept in weather forecasting through their Prithvi-Climate-and-Weather foundation model. "It learned the physical dynamics of global processes of the atmospheric system," Moreno says. "It could be used for generating physical-compliant simulations and multi-granular forecasting tasks, as well as downscaling to multiple resolutions."

Three companies have jumped into the sandbox: Uber, robot maker Figure AI, and autonomous vehicle developer Waabi have signed on to implement the technology. The platform comes with an open model license for customization.

3D design of balls rolling on a track

The latest AI News + Insights 


Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter. 

From virtual stumbles to real-world grace

Meta's chief AI scientist, Yann LeCun, has explained that a world model is a system that observes its environment and predicts what might happen next, considering its current knowledge and unknown factors that could affect future outcomes. He notes that current AI language models use a simpler version of this approach: they only look at past information to make predictions, without accounting for different possible actions or unknown variables.

The world model’s ability to simulate scenarios before real-world implementation can save enterprises both money and mishaps in robotics.

"World models allow machines to plan movements and interactions in simulated spaces, often called 'digital twins,' before attempting them in the physical world," Moreno says. "This dramatically reduces costly trial –and error, mitigates safety risks and accelerates learning for tasks such as industrial assembly, warehouse logistics or service-oriented robotics."

Moreno points out that these same simulation principles have also caught the attention of medical researchers, who spotted opportunities in drug development and disease treatment.

"In healthcare, world models unify data from multiple domains—genomic, proteomic, transcriptomic and chemical—to capture the complexities of biological systems at scale," Moreno says. "This holistic view empowers researchers and clinicians to uncover hidden patterns in large biomedical datasets, enabling tasks like gene perturbation prediction, disease state classification and therapy-response modeling."

However, achieving these ambitious healthcare applications requires extraordinary computing resources. Training these models demands massive processing power and data resources, even with specialized hardware. The first batch of Cosmos models hits NVIDIA's API catalog this year, alongside tools for processing video data.

The investment in computing muscle could open new doors across industries. Through AI world models, organizations can create virtual twins of their operations to test significant changes before implementation safely. These sophisticated simulations can allow companies to experiment with different setups—whether they're planning a new warehouse layout, or adding robots to their workflow—without disrupting their real-world business.

"Traditional gen AI approaches typically operate on textual or purely digital data, lacking the capacity to reason about physical objects and forces," Moreno says. "By encoding the rules that govern real-world interactions, world models can simulate and predict outcomes beyond text or images."

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Related solutions
IBM® watsonx Orchestrate™ 

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Artificial intelligence solutions

Put AI to work in your business with IBM’s industry-leading AI expertise and portfolio of solutions at your side.

Explore AI solutions
Artificial intelligence consulting and services

IBM Consulting AI services help reimagine how businesses work with AI for transformation.

Explore AI services
Take the next step

Whether you choose to customize pre-built apps and skills or build and deploy custom agentic services using an AI studio, the IBM watsonx platform has you covered.

  1. Explore watsonx Orchestrate
  2. Explore watsonx.ai