Weekly Research Digest — 2026-06-04
11 new entries this week across 3 topic areas.
Vision-Language-Action (VLA) Models
| Release | Venue | Significance |
|---|---|---|
| pi07-steerable-generalist-robotic-foundation-model π0.7: Steerable Generalist Robotic Foundation Model | arXiv / Physical Intelligence | First VLA showing convincing compositional generalization — combines skills zero-shot for unseen tasks |
| halo-unified-vla-embodied-multimodal-chain-of-thought-reasoning HALO: Unified VLA for Embodied Multimodal CoT | ICML 2026 | MoT architecture unifying text reasoning, visual subgoal prediction, and action in one model |
| progressvla-progress-guided-diffusion-policy-vla-manipulation ProgressVLA: Progress-Guided Diffusion Policy | arXiv / Microsoft Research | Injects explicit task-progress awareness into VLA diffusion policy via pre-trained estimator |
| atomvla-scalable-post-training-robotic-manipulation-predictive-latent-world-models AtomVLA: Scalable Post-Training via Predictive Latent World Models | arXiv | First subtask-aware VLA with scalable offline post-training pipeline eliminating online RL need |
| hex-humanoid-aligned-experts-cross-embodiment-whole-body-manipulation HEX: Humanoid-Aligned Experts for Whole-Body Manipulation | arXiv | MoE-based VLA for full-body humanoid coordination across diverse embodiments |
| echo-continuous-hierarchical-memory-vla-long-horizon ECHO: Continuous Hierarchical Memory for VLAs | arXiv | Hyperbolic-space memory tree for efficient experience retrieval in long-horizon tasks |
| ahead-intercepting-the-future-latent-space-predictive-world-model-dynamic-vla AHEAD: Latent-Space Predictive World Model for Dynamic VLA Manipulation | arXiv / CMU | Patch-level latent world model wrapper enabling frozen VLAs to handle moving objects |
World Models for Robotics
| Release | Venue | Significance |
|---|---|---|
| v-jepa-21-unlocking-dense-features-video-self-supervised-learning V-JEPA 2.1: Unlocking Dense Features in Video SSL | arXiv / Meta FAIR | 20-point real-robot grasping improvement over V-JEPA 2 AC via better spatiotemporal representations |
| ddp-wm-disentangled-dynamics-prediction-efficient-world-models DDP-WM: Disentangled Dynamics Prediction for World Models | CVPR 2026 | ~9× inference speedup over dense Transformers by separating foreground dynamics from background |
Reinforcement Learning for Robotics
| Release | Venue | Significance |
|---|---|---|
| sole-r1-video-language-reasoning-sole-reward-on-robot-rl SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL | arXiv | Zero-shot online RL for novel tasks using only a video-language reasoning model as reward signal |
| vla-continual-learning-real-world-without-forgetting Can VLA Models Learn from Real-World Data Continually without Forgetting? | arXiv | First real-world continual learning benchmark for VLAs; documents significant catastrophic forgetting |
Generated automatically. All entries verified via web search.