LaWM: Least Action World Models for Long-Horizon Physical Consistency from Visual Observations

Summary

LaWM operationalizes the Principle of Least Action inside a learned visual latent space: it encodes observations into generalized coordinates, learns a discrete Lagrangian over consecutive latent states, and advances predictions by solving the corresponding discrete variational integration condition. Because the latent transition is induced by a variational principle rather than an unconstrained neural function, LaWM provides a structure-preserving inductive bias for long-horizon visual prediction.

Key Contributions

Latent variational integrator derived from a learned discrete Lagrangian, enforcing physical consistency without explicit physics supervision
Improved physical invariance, background consistency, motion smoothness, and geometric prediction over video-generation and world-model baselines
Validated on both physics-clean synthetic dynamics and embodied robot interaction benchmarks

Significance

LaWM directly addresses energy drift and physically inconsistent futures that plague existing long-horizon latent world models, offering a theoretically grounded alternative that integrates cleanly with standard robot learning pipelines.

Embodied Robotics Research

Explorer

LaWM: Least Action World Models for Long-Horizon Physical Consistency from Visual Observations

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks