Summary
MWM presents a world model for image-goal navigation that addresses action-conditioned consistency failures under multi-step rollout, which degrade planning quality even when individual frames are visually plausible. It uses a two-stage training pipeline: structure pretraining followed by Action-Conditioned Consistency (ACC) post-training that explicitly trains under self-conditioned rollout contexts to reduce error accumulation.
Key Contributions
- Action-Conditioned Consistency (ACC) post-training stage to align autoregressive predictions with real observations under rollout
- Inference-Consistent State Distillation (ICSD): extends consistency distillation to few-step diffusion while preserving action-conditioned rollout consistency
- Two-stage framework enabling efficient deployment with few-step inference without sacrificing planning coherence
Significance
Addresses a critical but underappreciated failure mode of navigation world models — rollout drift — with a principled consistency-enforcement framework applicable broadly to diffusion-based world models.