Weekly Research Digest — 2026-05-11

8 new entries this week across 3 topic areas.


Vision-Language-Action (VLA) Models

ReleaseVenueSignificance
anticipation-vla-long-horizon-embodied-tasks-subgoal-generation Anticipation-VLAarXiv 2605.01772Hierarchical VLA with adaptive anticipation model that recursively generates visual subgoals to tackle compounding errors in long-horizon tasks
roboalign-test-time-reasoning-language-action-alignment-vla RoboAlignarXiv 2603.21341Two-stage SFT+RL framework that aligns language-action token representations, yielding 106.6% real-world improvement over SFT baseline with <1% extra data

World Models for Robotics

ReleaseVenueSignificance
being-h07-latent-world-action-model-egocentric-videos Being-H0.7arXiv 2605.00078Latent world-action model using learnable query slots as a compact reasoning interface, pretrained on large-scale egocentric video instead of pixel prediction
roboalign-r1-multimodal-reward-alignment-robot-video-world-models RoboAlign-R1arXiv 2605.03821Introduces RobotWorldBench and GRPO-based RL post-training to align video world models with decision-relevant quality metrics rather than pixel reconstruction
do-world-action-models-generalize-better-than-vlas-robustness-study Do WAMs Generalize Better than VLAs?arXiv 2603.22078First large-scale robustness comparison showing WAMs outperform VLAs under visual and language perturbation on augmented LIBERO-Plus and RoboTwin 2.0-Plus benchmarks
mwm-mobile-world-models-action-conditioned-consistent-prediction MWM: Mobile World ModelsarXiv 2603.07799Action-conditioned consistency post-training and ICSD distillation to eliminate rollout drift in navigation world models under multi-step planning
mask-world-model-predicting-what-matters-robust-robot-policy-learning Mask World ModelarXiv 2604.19683Predicts semantic masks instead of pixels in a video diffusion world model, imposing a geometric bottleneck that filters visual distractors and improves generalization

Reinforcement Learning for Robotics

ReleaseVenueSignificance
flashsac-fast-stable-off-policy-rl-high-dimensional-robot-control FlashSACarXiv 2604.04539Stabilized off-policy SAC with norm bounding and reduced gradient updates, outperforming PPO on 60+ tasks with humanoid sim-to-real training time cut from hours to minutes

Generated automatically. All entries verified via web search.