Summary

This empirical study systematically ablates algorithmic, systems, and experimental design choices for sim-to-online RL on physical robots across 100 real-world training runs on three distinct robotic platforms. The work identifies that several widely adopted defaults are actually harmful, and that a small set of principled choices — retaining data across real-world trials and delaying critic updates — yield stable, reliable online RL without extensive engineering overhead.

Key Contributions

  • Comprehensive 100-run ablation of design choices for sim-to-online RL across three real robot platforms
  • Identifies harmful defaults in standard RL practice that degrade real-robot performance
  • Key recipe: retain cross-trial data replay buffer + delay critic updates for stability
  • Actionable, hardware-agnostic guidelines that reduce the engineering barrier for deploying online RL on real robots
  • Empirical study bridging the sim-to-online gap with practical, reproducible findings

Significance

By grounding design choices in large-scale real-robot experiments rather than simulation benchmarks, this work provides the community with a reliable starting recipe for online RL deployment — directly addressing the reproducibility gap in real-world robot learning.