Summary

This empirical study systematically examines what actually drives sim-to-real generalization for VLA-based dexterous manipulation policies across four dimensions: multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning updates. By conducting controlled experiments on a standardized benchmark with public robotic platforms and evaluation protocols, the paper provides principled, reproducible guidance for practitioners.

Key Contributions

  • Comprehensive ablation across four sim-to-real transfer axes (domain randomization, rendering fidelity, physics accuracy, RL fine-tuning) applied to VLA policies
  • Public release of robotic platforms and evaluation protocol enabling independent verification and benchmark comparisons
  • Establishes a realistic, standardized benchmark for dexterous manipulation policies trained via sim-to-real RL

Significance

Fills a critical gap in the sim-to-real literature by grounding algorithm recommendations in real-world dexterous manipulation tasks, moving beyond toy benchmarks to provide practitioners with actionable, reproducible insights for VLA-based robot training pipelines.