World Models for Robotic Manipulation: A Survey

Summary

This survey examines the growing role of world models in robotic manipulation through three organizing questions: what future representation is predicted (pixels, latents, object states), how prediction is connected to action (implicit vs. explicit coupling), and when prediction is used in the robot-learning pipeline (pretraining, data augmentation, planning, evaluation). It provides a taxonomy of current approaches and highlights open challenges.

Key Contributions

Unified three-axis taxonomy: representation predicted, coupling to action, and stage of use in the pipeline
Comprehensive review of world-model-based data augmentation, model-based RL, and test-time planning for manipulation
Analysis of evaluation protocols and benchmarks across simulation and real-robot settings
Identifies key gaps: physical consistency, long-horizon coherence, and real-world calibration

Significance

As world models proliferate across robotics research, this survey provides a timely and structured overview that helps practitioners navigate the design space and select approaches appropriate for their manipulation tasks.

Embodied Robotics Research

Explorer

World Models for Robotic Manipulation: A Survey

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks