Weekly Research Digest — 2026-06-04

11 new entries this week across 3 topic areas.


Vision-Language-Action (VLA) Models

ReleaseVenueSignificance
pi07-steerable-generalist-robotic-foundation-model π0.7: Steerable Generalist Robotic Foundation ModelarXiv / Physical IntelligenceFirst VLA showing convincing compositional generalization — combines skills zero-shot for unseen tasks
halo-unified-vla-embodied-multimodal-chain-of-thought-reasoning HALO: Unified VLA for Embodied Multimodal CoTICML 2026MoT architecture unifying text reasoning, visual subgoal prediction, and action in one model
progressvla-progress-guided-diffusion-policy-vla-manipulation ProgressVLA: Progress-Guided Diffusion PolicyarXiv / Microsoft ResearchInjects explicit task-progress awareness into VLA diffusion policy via pre-trained estimator
atomvla-scalable-post-training-robotic-manipulation-predictive-latent-world-models AtomVLA: Scalable Post-Training via Predictive Latent World ModelsarXivFirst subtask-aware VLA with scalable offline post-training pipeline eliminating online RL need
hex-humanoid-aligned-experts-cross-embodiment-whole-body-manipulation HEX: Humanoid-Aligned Experts for Whole-Body ManipulationarXivMoE-based VLA for full-body humanoid coordination across diverse embodiments
echo-continuous-hierarchical-memory-vla-long-horizon ECHO: Continuous Hierarchical Memory for VLAsarXivHyperbolic-space memory tree for efficient experience retrieval in long-horizon tasks
ahead-intercepting-the-future-latent-space-predictive-world-model-dynamic-vla AHEAD: Latent-Space Predictive World Model for Dynamic VLA ManipulationarXiv / CMUPatch-level latent world model wrapper enabling frozen VLAs to handle moving objects

World Models for Robotics

ReleaseVenueSignificance
v-jepa-21-unlocking-dense-features-video-self-supervised-learning V-JEPA 2.1: Unlocking Dense Features in Video SSLarXiv / Meta FAIR20-point real-robot grasping improvement over V-JEPA 2 AC via better spatiotemporal representations
ddp-wm-disentangled-dynamics-prediction-efficient-world-models DDP-WM: Disentangled Dynamics Prediction for World ModelsCVPR 2026~9× inference speedup over dense Transformers by separating foreground dynamics from background

Reinforcement Learning for Robotics

ReleaseVenueSignificance
sole-r1-video-language-reasoning-sole-reward-on-robot-rl SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot RLarXivZero-shot online RL for novel tasks using only a video-language reasoning model as reward signal
vla-continual-learning-real-world-without-forgetting Can VLA Models Learn from Real-World Data Continually without Forgetting?arXivFirst real-world continual learning benchmark for VLAs; documents significant catastrophic forgetting

Generated automatically. All entries verified via web search.