Summary

This survey examines the growing role of world models in robotic manipulation through three organizing questions: what future representation is predicted (pixels, latents, object states), how prediction is connected to action (implicit vs. explicit coupling), and when prediction is used in the robot-learning pipeline (pretraining, data augmentation, planning, evaluation). It provides a taxonomy of current approaches and highlights open challenges.

Key Contributions

  • Unified three-axis taxonomy: representation predicted, coupling to action, and stage of use in the pipeline
  • Comprehensive review of world-model-based data augmentation, model-based RL, and test-time planning for manipulation
  • Analysis of evaluation protocols and benchmarks across simulation and real-robot settings
  • Identifies key gaps: physical consistency, long-horizon coherence, and real-world calibration

Significance

As world models proliferate across robotics research, this survey provides a timely and structured overview that helps practitioners navigate the design space and select approaches appropriate for their manipulation tasks.