World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

Summary

World Action Verifier (WAV) is a framework that enables world models to detect their own prediction errors and self-improve by decomposing action-conditioned state prediction into two easier sub-problems: state plausibility verification and action reachability verification. The key insight is that two asymmetries — the broader availability of action-free data and the lower dimensionality of action-relevant features — make each verification problem substantially easier than direct future prediction.

Key Contributions

Decomposes action-conditioned prediction into state plausibility and action reachability, each verifiable with fewer resources
Leverages forward-inverse asymmetry: action-free data is orders of magnitude more available than action-labeled data
Self-improvement loop: WAV identifies world-model errors without ground truth labels and uses them to guide targeted data collection or training
Validated on multiple robotic manipulation benchmarks; recognized as Outstanding Paper at ICLR 2026 World Models Workshop

Significance

WAV introduces a principled self-supervision signal for world model improvement — a key step toward continual, autonomous refinement of robot simulators without expensive human labeling, from a strong multi-institution team (Stanford, UCSD, CMU, Google DeepMind, Harvard).

Embodied Robotics Research

Explorer

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks