Summary

This position/perspective paper proposes Hamiltonian World Models as a unified physically grounded framework for world modeling, covering the three currently fragmented routes: 2D video-generative models, 3D scene-centric models, and JEPA-like latent models. The core idea is to encode observations into a structured latent phase space, evolve it through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, and decode predicted trajectories into future observations for planning.

Key Contributions

  • Theoretical framework unifying video generation, 3D reconstruction, and latent prediction under Hamiltonian dynamics
  • Analysis of how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability for robotic world models
  • Discussion of practical challenges (friction, contact, non-conservative forces, deformables) and proposed research directions

Significance

Provides a theoretically rigorous bridge between classical mechanics and modern generative world models, potentially enabling more stable and interpretable long-horizon rollouts in model-based robot RL.