Weekly Research Digest — 2026-06-04

Weekly Research Digest — 2026-06-04

11 new entries this week across 3 topic areas.

Vision-Language-Action (VLA) Models

Release	Venue	Significance
pi07-steerable-generalist-robotic-foundation-model π0.7: Steerable Generalist Robotic Foundation Model	arXiv / Physical Intelligence	First VLA showing convincing compositional generalization — combines skills zero-shot for unseen tasks
halo-unified-vla-embodied-multimodal-chain-of-thought-reasoning HALO: Unified VLA for Embodied Multimodal CoT	ICML 2026	MoT architecture unifying text reasoning, visual subgoal prediction, and action in one model
progressvla-progress-guided-diffusion-policy-vla-manipulation ProgressVLA: Progress-Guided Diffusion Policy	arXiv / Microsoft Research	Injects explicit task-progress awareness into VLA diffusion policy via pre-trained estimator
atomvla-scalable-post-training-robotic-manipulation-predictive-latent-world-models AtomVLA: Scalable Post-Training via Predictive Latent World Models	arXiv	First subtask-aware VLA with scalable offline post-training pipeline eliminating online RL need
hex-humanoid-aligned-experts-cross-embodiment-whole-body-manipulation HEX: Humanoid-Aligned Experts for Whole-Body Manipulation	arXiv	MoE-based VLA for full-body humanoid coordination across diverse embodiments
echo-continuous-hierarchical-memory-vla-long-horizon ECHO: Continuous Hierarchical Memory for VLAs	arXiv	Hyperbolic-space memory tree for efficient experience retrieval in long-horizon tasks
ahead-intercepting-the-future-latent-space-predictive-world-model-dynamic-vla AHEAD: Latent-Space Predictive World Model for Dynamic VLA Manipulation	arXiv / CMU	Patch-level latent world model wrapper enabling frozen VLAs to handle moving objects

World Models for Robotics

Release	Venue	Significance
v-jepa-21-unlocking-dense-features-video-self-supervised-learning V-JEPA 2.1: Unlocking Dense Features in Video SSL	arXiv / Meta FAIR	20-point real-robot grasping improvement over V-JEPA 2 AC via better spatiotemporal representations
ddp-wm-disentangled-dynamics-prediction-efficient-world-models DDP-WM: Disentangled Dynamics Prediction for World Models	CVPR 2026	~9× inference speedup over dense Transformers by separating foreground dynamics from background

Reinforcement Learning for Robotics

Release	Venue	Significance
sole-r1-video-language-reasoning-sole-reward-on-robot-rl SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RL	arXiv	Zero-shot online RL for novel tasks using only a video-language reasoning model as reward signal
vla-continual-learning-real-world-without-forgetting Can VLA Models Learn from Real-World Data Continually without Forgetting?	arXiv	First real-world continual learning benchmark for VLAs; documents significant catastrophic forgetting

Generated automatically. All entries verified via web search.