Weekly Research Digest — 2026-06-15

11 new entries this week across 3 topic areas.


Vision-Language-Action (VLA) Models

ReleaseVenueSignificance
seetraceact-visibility-aware-latent-planning-cross-embodiment-demos SeeTraceActarXiv 2606.02745Demo-conditioned VLA with visibility-aware end-effector trace prediction; introduces RoboCasa-DC cross-embodiment benchmark; +12.5pp real-world success
3dthinkvla-latent-3d-priors-vla-co-training 3DThinkVLAarXiv 2606.04436Injects latent 3D geometry and reasoning priors into VLAs via co-training + anchor token; fixes prompt-induced reasoning gap without backbone changes
affordancevla-affordance-aware-vla-action-generation AffordanceVLAarXiv 2606.06155Which/Where/How2Act affordance modules + MoT architecture bridge VLM semantics to precise robot control; includes automated affordance data pipeline
memoryvla-plus-plus-temporal-modeling-memory-imagination-vla MemoryVLA++arXiv 2606.09827Cognitive-science-inspired temporal VLA with working memory, episodic memory, and imagination; +9/26/28% on general/memory/imagination-dependent tasks
hierarchical-vla-agents-orchestrating-robot-policies Hierarchical VLA Agents (Google DeepMind)arXiv 2606.10267First systematic options-framework study of Hi-VLA design; distils practical principles for planner/controller interfaces across short- and long-horizon tasks

World Models for Robotics

ReleaseVenueSignificance
tau0-wm-unified-video-action-world-model-agibot τ₀-WM (AgiBot)arXiv 2606.010275B-parameter open robotic foundation model trained on 27.3K hours; unifies policy, video prediction, and action evaluation in one diffusion backbone
motionwam-foundation-world-action-model-humanoid-loco-manipulation MotionWAMarXiv 2606.09215Real-time (4.9 Hz, 7× faster than Cosmos Policy) unified WAM for humanoid loco-manipulation; removes upper/lower-body split with a single motion latent
making-foresight-actionable-agra-representation-alignment-wam Making Foresight Actionable (AGRA)arXiv 2606.12217AGRA objective aligns video diffusion features to a semantic encoder to fix the reconstruction-vs-control representation mismatch in WAMs (HKU/XPENG)
repwam-world-action-modeling-representation-visual-action-tokenizers RepWAMarXiv 2606.13674Replaces reconstruction tokenisers in WAMs with semantically aligned visual-action tokenizers; strong gains across real-world manipulation and simulation
targeting-world-models-adversarial-robot-learning-pipelines Targeting World Models (Adversarial)arXiv 2606.09499First formal study of data-poisoning attacks through world models in robot learning pipelines; highlights critical supply-chain security gap

Reinforcement Learning for Robotics

ReleaseVenueSignificance
sarm2-stage-aware-reward-modeling-self-improving-robot-manipulation SARM2 + SPIRALarXiv 2606.10305Multi-task stage-aware reward model (MMoE + action-primitive vocabulary) + SPIRAL self-improvement loop; autonomous real-robot policy improvement without new demos

Generated automatically. All entries verified via web search.