Summary
This empirical study systematically examines what actually drives sim-to-real generalization for VLA-based dexterous manipulation policies across four dimensions: multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning updates. By conducting controlled experiments on a standardized benchmark with public robotic platforms and evaluation protocols, the paper provides principled, reproducible guidance for practitioners.
Key Contributions
- Comprehensive ablation across four sim-to-real transfer axes (domain randomization, rendering fidelity, physics accuracy, RL fine-tuning) applied to VLA policies
- Public release of robotic platforms and evaluation protocol enabling independent verification and benchmark comparisons
- Establishes a realistic, standardized benchmark for dexterous manipulation policies trained via sim-to-real RL
Significance
Fills a critical gap in the sim-to-real literature by grounding algorithm recommendations in real-world dexterous manipulation tasks, moving beyond toy benchmarks to provide practitioners with actionable, reproducible insights for VLA-based robot training pipelines.