World models help AI learn what five-year-olds know about gravity

Author

Staff Writer

IBM

A toddler knows not to stack bigger blocks on smaller ones. A robot? Not so much. At least until now.

Traditional AI models excel at processing text and digital data, but struggle with basic physics that children grasp naturally. NVIDIA aims to change that with NVIDIA Cosmos, a new platform announced at CES 2025 that teaches machines how the physical world works.

The technology centers on "world models," AI systems that form internal representations of structure, dynamics and causal relationships. These models could transform how robots and autonomous vehicles navigate real-world environments and help in areas such as weather prediction and medicine.

"World models fundamentally change how systems perceive and interact with their environments," says Juan Bernabé-Moreno, Director of IBM Research in Europe for Ireland and the UK. "Rather than simply mapping inputs to outputs, these models form internal representations that capture structure, dynamics and causal relationships. It enables the handling of unstructured data more fluidly, adapting to unseen conditions and making inferences based on fewer direct examples or instructions."

Teaching robots to play with blocks

The Cosmos platform includes foundation models that can generate physics-based simulations for training AI systems, along with advanced tools that NVIDIA says can process and label 20 million hours of video in just two weeks using its Blackwell platform—a task that would take over three years with traditional CPU processing.

While other AI models generate text or images, Cosmos focuses on physics-based interactions in industrial and driving environments. Developers can customize the system with their data, like footage from warehouse robots or autonomous test drives. The platform has already attracted partners like Uber, which sees it as a potential fast track to autonomous vehicles.

NVIDIA is releasing the models under an open license through platforms like Hugging Face. CEO Jensen Huang calls it a potential "ChatGPT moment" for robotics, suggesting world foundation models could democratize physical AI, much like how large language models (LLMs) transformed text generation.

Armand Ruiz, an VP of Product at IBM Software focused on AI platforms, weighed in on the Cosmos project in a LinkedIn post, calling the robot-training system a "technical masterpiece.”. The open-source system, trained on 20 million hours of real-world footage, represents Nvidia's attempt to create foundation models for robotic movement and interaction.

"The best is the project is Open Source!" Ruiz wrote, noting that Cosmos can simulate scenarios like boxes falling in warehouses and allows companies to customize training with their own data. The system works with NVIDIA's Isaac simulation platform, though its real-world performance remains to be tested.

IBM researchers used this concept in weather forecasting through their Prithvi-Climate-and-Weather foundation model. "It learned the physical dynamics of global processes of the atmospheric system," Moreno says. "It could be used for generating physical-compliant simulations and multi-granular forecasting tasks, as well as downscaling to multiple resolutions."

Three companies have jumped into the sandbox: Uber, robot maker Figure AI, and autonomous vehicle developer Waabi have signed on to implement the technology. The platform comes with an open model license for customization.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

From virtual stumbles to real-world grace

Meta's chief AI scientist, Yann LeCun, has explained that a world model is a system that observes its environment and predicts what might happen next, considering its current knowledge and unknown factors that could affect future outcomes. He notes that current AI language models use a simpler version of this approach: they only look at past information to make predictions, without accounting for different possible actions or unknown variables.

The world model’s ability to simulate scenarios before real-world implementation can save enterprises both money and mishaps in robotics.

"World models allow machines to plan movements and interactions in simulated spaces, often called 'digital twins,' before attempting them in the physical world," Moreno says. "This dramatically reduces costly trial –and error, mitigates safety risks and accelerates learning for tasks such as industrial assembly, warehouse logistics or service-oriented robotics."

Moreno points out that these same simulation principles have also caught the attention of medical researchers, who spotted opportunities in drug development and disease treatment.

"In healthcare, world models unify data from multiple domains—genomic, proteomic, transcriptomic and chemical—to capture the complexities of biological systems at scale," Moreno says. "This holistic view empowers researchers and clinicians to uncover hidden patterns in large biomedical datasets, enabling tasks like gene perturbation prediction, disease state classification and therapy-response modeling."

However, achieving these ambitious healthcare applications requires extraordinary computing resources. Training these models demands massive processing power and data resources, even with specialized hardware. The first batch of Cosmos models hits NVIDIA's API catalog this year, alongside tools for processing video data.

The investment in computing muscle could open new doors across industries. Through AI world models, organizations can create virtual twins of their operations to test significant changes before implementation safely. These sophisticated simulations can allow companies to experiment with different setups—whether they're planning a new warehouse layout, or adding robots to their workflow—without disrupting their real-world business.

"Traditional gen AI approaches typically operate on textual or purely digital data, lacking the capacity to reason about physical objects and forces," Moreno says. "By encoding the rules that govern real-world interactions, world models can simulate and predict outcomes beyond text or images."

AI Academy

Become an AI expert

Gain the knowledge to prioritize AI investments that drive business growth. Get started with our free AI Academy today and lead the future of AI in your organization.

Watch the series

Start realizing ROI: A practical guide to agentic AI

Learn how to scale agentic AI for measurable ROI across your enterprise. This playbook outlines the top barriers that limit impact, how to effectively measure ROI and a practical framework to drive successful, enterprise-wide adoption.

Resources

The enterprise in 2030: Engineered for perpetual innovation

Discover our five predictions about what will define the most successful enterprises in 2030 and the steps leaders can take to gain an AI-first advantage.

Take your gen AI skills to the next level

Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.

Put AI to work: Driving ROI with gen AI

Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to applying AI to drive transformation at the core.

Generative AI explained

Techsplainers by IBM breaks down the essentials of gen AI, from key concepts to real‑world use cases. Clear, quick episodes help you learn the fundamentals fast.

The CEO's guide to generative AI

Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.

watsonx® Developer Hub

Explore essential tools and resources to accelerate your next project. Get started and discover the full range of supported models available from IBM.

The truth about successful generative AI

Uncover the benefits of AI platforms that enable foundation model customization through technology, processes and best practices to help you easily operationalize the gen AI lifecycle.

Explore IBM Granite

IBM Granite® is our family of open, performant, and trusted AI models designed for business and optimized to scale your AI applications. Explore models for language, code, time series and guardrails.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

How to thrive in this new era of AI with trust and confidence

Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

World models help AI learn what five-year-olds know about gravity

Author

Teaching robots to play with blocks

The latest AI News + Insights

From virtual stumbles to real-world grace

Become an AI expert

Share

Resources

The latest AI News + Insights