Summary
RLinf-Co moves beyond SFT-only sim-real co-training by leveraging closed-loop RL in simulation while anchoring the policy to real-world data via an auxiliary supervised loss. The two-stage framework first warm-starts the policy with SFT on mixed real and simulated demonstrations, then fine-tunes with RL in simulation incorporating a real-world regularization term into the overall objective. Evaluated on four real-world tabletop manipulation tasks using OpenVLA and π₀.₅, RLinf-Co achieves +24% and +20% real-world success rate improvements respectively over real-only fine-tuning.
Key Contributions
- RL-based sim-real co-training framework that exploits closed-loop interaction in simulation rather than static SFT
- Two-stage design: SFT warm-start on mixed data → RL fine-tuning with real-data auxiliary supervised loss
- Real-world regularization term in the RL objective prevents catastrophic forgetting of real-world capabilities
- Validated on OpenVLA (+24%) and π₀.₅ (+20%) across four real tabletop manipulation tasks
- Built on the RLinf open-source RL infrastructure for embodied AI
Significance
RLinf-Co shows that interactive simulation provides a strictly stronger training signal than SFT on simulated demonstrations, while the real-data anchor resolves the forgetting problem that has limited prior sim-RL approaches for VLAs.