Summary
This survey argues that future progress in VLA models will depend less on architecture innovations and more on co-designing high-fidelity data engines with structured evaluation protocols. Organized around three pillars — datasets, benchmarks, and data engines — it provides a systematic analysis of the data infrastructure underlying embodied learning and identifies critical bottlenecks that limit real-world deployment.
Key Contributions
- Systematic review of VLA datasets covering scale, diversity, embodiment coverage, and annotation quality
- Analysis of benchmark design principles, identifying gaps in current evaluation protocols
- Survey of data engine pipelines including simulation, human teleoperation, and automated data collection
- Argument for data infrastructure co-design as the primary driver of next-generation VLA advances
Significance
By reframing VLA progress as a data problem rather than a model problem, this survey provides a critical roadmap for the community — highlighting where investment in data pipelines and evaluation benchmarks will yield the highest returns for embodied AI.