Summary

MWM presents a world model for image-goal navigation that addresses action-conditioned consistency failures under multi-step rollout, which degrade planning quality even when individual frames are visually plausible. It uses a two-stage training pipeline: structure pretraining followed by Action-Conditioned Consistency (ACC) post-training that explicitly trains under self-conditioned rollout contexts to reduce error accumulation.

Key Contributions

  • Action-Conditioned Consistency (ACC) post-training stage to align autoregressive predictions with real observations under rollout
  • Inference-Consistent State Distillation (ICSD): extends consistency distillation to few-step diffusion while preserving action-conditioned rollout consistency
  • Two-stage framework enabling efficient deployment with few-step inference without sacrificing planning coherence

Significance

Addresses a critical but underappreciated failure mode of navigation world models — rollout drift — with a principled consistency-enforcement framework applicable broadly to diffusion-based world models.