Summary

This survey argues that future progress in VLA models will depend less on architecture innovations and more on co-designing high-fidelity data engines with structured evaluation protocols. Organized around three pillars — datasets, benchmarks, and data engines — it provides a systematic analysis of the data infrastructure underlying embodied learning and identifies critical bottlenecks that limit real-world deployment.

Key Contributions

  • Systematic review of VLA datasets covering scale, diversity, embodiment coverage, and annotation quality
  • Analysis of benchmark design principles, identifying gaps in current evaluation protocols
  • Survey of data engine pipelines including simulation, human teleoperation, and automated data collection
  • Argument for data infrastructure co-design as the primary driver of next-generation VLA advances

Significance

By reframing VLA progress as a data problem rather than a model problem, this survey provides a critical roadmap for the community — highlighting where investment in data pipelines and evaluation benchmarks will yield the highest returns for embodied AI.