HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

Summary

HEX is a state-centric VLA framework for coordinated whole-body manipulation on full-sized bipedal humanoid robots. It introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. Lightweight history tokens summarize past observations for efficient temporal context without repeated image re-encoding.

Key Contributions

Humanoid-aligned universal state representation enabling cross-embodiment scalability
Mixture-of-Experts Unified Proprioceptive Predictor for whole-body coordination modeling
Lightweight history-token mechanism for efficient temporal context during inference
State-of-the-art performance on real-world humanoid manipulation tasks, especially in fast-reaction and long-horizon scenarios

Significance

Addresses a critical gap in VLA research by enabling coordinated whole-body humanoid control where most existing approaches treat robot body parts independently, filling a key requirement for practical humanoid deployment.

Embodied Robotics Research

Explorer

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks