Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation

Summary

This paper studies efficient integration of Model Predictive Control (MPC) as a training-time reward signal for humanoid RL, combining the physical grounding of MPC with the robustness and whole-body skill diversity of large-scale RL. The key contribution is π^nMPC, a parallel-in-horizon, construction-free GPU MPC solver that operates directly on time-varying centroidal dynamics, making MPC guidance practical inside massively parallel RL training.

Key Contributions

Centroidal-dynamics MPC reward formulation that provides physically grounded guidance during RL training
π^nMPC: a GPU-native parallel MPC solver that avoids expensive problem pre-compilation and reduces training overhead
Scales to massively parallel RL environments without prohibitive memory or compute costs
Validated on humanoid locomotion and manipulation tasks at Caltech and Johns Hopkins

Significance

Brings MPC’s constraint satisfaction and physical grounding directly into large-scale RL for humanoids, addressing a longstanding bottleneck where MPC-guided RL was too slow to be practical for real-scale training.

Embodied Robotics Research

Explorer

Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks