Summary
DDP-WM introduces the Disentangled Dynamics Prediction (DDP) principle: latent state evolution is decomposed into sparse primary dynamics (driven by physical interactions) and secondary context-driven background updates. This decomposition is realized through dynamic localization to isolate foreground primary dynamics, a cross-attention mechanism for background updates, and a Low-Rank Correction Module (LRM) for background. The result is a world model that is ~9× faster at inference than dense Transformer baselines while achieving higher task success.
Key Contributions
- Disentangled Dynamics Prediction principle separating foreground physical interactions from background context
- Four-stage decoupled process: dynamic localization → primary predictor → LRM background update
- ~9× inference speedup on Push-T task vs. state-of-the-art dense models
- Success rate improvement from 90% to 98% on Push-T with MPC planning
- Validated across navigation, tabletop manipulation, deformable-object, and multi-body interaction tasks
Significance
Addresses the computational bottleneck of dense Transformer world models while improving performance, making real-time world-model-based planning practical for a wider range of robotic systems.