Summary
WEAVER (World Estimation Across Views for Embodied Reasoning) is a multi-view world model that jointly satisfies the three key desiderata for robot world models: fidelity (realistic trajectory predictions), consistency (coherence over long horizons), and efficiency (fast inference). Trained with a flow-matching loss to predict future latents and reward values, WEAVER achieves state-of-the-art results across policy evaluation, improvement, and test-time planning.
Key Contributions
- Multi-view latent world model trained with flow-matching loss for prediction and reward estimation
- Policy evaluation: ρ=0.870 correlation with real-world success rate — best published result
- Policy improvement: 38% real-world success rate gain on top of the π₀.₅ foundation model
- Test-time planning: 14% additional improvement with 5–10× speedup over prior world models
- Superior out-of-distribution generalization compared to previous world model approaches
Significance
WEAVER demonstrates that a single world model architecture can simultaneously serve as evaluator, trainer, and planner for robot policies, with strong empirical gains on top of state-of-the-art foundation models.