Summary
LaWM operationalizes the Principle of Least Action inside a learned visual latent space: it encodes observations into generalized coordinates, learns a discrete Lagrangian over consecutive latent states, and advances predictions by solving the corresponding discrete variational integration condition. Because the latent transition is induced by a variational principle rather than an unconstrained neural function, LaWM provides a structure-preserving inductive bias for long-horizon visual prediction.
Key Contributions
- Latent variational integrator derived from a learned discrete Lagrangian, enforcing physical consistency without explicit physics supervision
- Improved physical invariance, background consistency, motion smoothness, and geometric prediction over video-generation and world-model baselines
- Validated on both physics-clean synthetic dynamics and embodied robot interaction benchmarks
Significance
LaWM directly addresses energy drift and physically inconsistent futures that plague existing long-horizon latent world models, offering a theoretically grounded alternative that integrates cleanly with standard robot learning pipelines.