Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Summary

This empirical study systematically examines what actually drives sim-to-real generalization for VLA-based dexterous manipulation policies across four dimensions: multi-level domain randomization, photorealistic rendering, physics-realistic modeling, and reinforcement learning updates. By conducting controlled experiments on a standardized benchmark with public robotic platforms and evaluation protocols, the paper provides principled, reproducible guidance for practitioners.

Key Contributions

Comprehensive ablation across four sim-to-real transfer axes (domain randomization, rendering fidelity, physics accuracy, RL fine-tuning) applied to VLA policies
Public release of robotic platforms and evaluation protocol enabling independent verification and benchmark comparisons
Establishes a realistic, standardized benchmark for dexterous manipulation policies trained via sim-to-real RL

Significance

Fills a critical gap in the sim-to-real literature by grounding algorithm recommendations in real-world dexterous manipulation tasks, moving beyond toy benchmarks to provide practitioners with actionable, reproducible insights for VLA-based robot training pipelines.

Embodied Robotics Research

Explorer

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks