Weekly Research Digest — 2026-05-18
11 new entries this week across 3 topic areas.
Vision-Language-Action (VLA) Models
| Release | Venue | Significance |
|---|---|---|
| alam-algebraically-consistent-latent-action-model-vla ALAM | arXiv 2605.10819 | Algebraic structure on latent actions lifts MetaWorld MT50 success 47.9% → 85.0% |
| vla-forget-vision-language-action-unlearning-embodied VLA-Forget | arXiv 2604.03956 | First machine-unlearning framework targeting VLA models for safe post-deployment correction |
| from-pixels-to-tokens-latent-action-supervision-vla From Pixels to Tokens | arXiv 2605.04678 | Systematic study revealing image- vs action-based latent supervision best-suited to different task types |
| defi-disentangled-robot-learning-forward-inverse-dynamics-pretraining DeFI | ICLR 2026 | Decouples visual forward/inverse dynamics pretraining, enabling action-free web video exploitation for VLAs |
World Models for Robotics
| Release | Venue | Significance |
|---|---|---|
| world-model-for-robot-learning-comprehensive-survey World Model Survey | arXiv 2605.00080 | Comprehensive multi-institution survey unifying world model roles in policy learning, planning, and data generation |
| lawm-least-action-world-models-long-horizon-physical-consistency LaWM | arXiv 2605.08279 | Variational integrator grounded in Principle of Least Action for physically consistent long-horizon prediction |
| one-token-per-frame-visual-bandwidth-world-models-vla-policy One Token Per Frame | arXiv 2605.07931 | Compresses world-model visual stream to 1 token/frame via adaptive pooling without performance loss |
| physically-native-world-models-hamiltonian-perspective Physically Native WMs | arXiv 2605.00412 | Proposes Hamiltonian World Models unifying video, 3D, and latent approaches under classical mechanics principles |
Reinforcement Learning for Robotics
| Release | Venue | Significance |
|---|---|---|
| scaling-sim-to-real-rl-robot-vla-generative-3d-worlds Scaling Sim-to-Real RL | arXiv 2603.18532 | Generative 3D worlds automate scene diversity for VLA RL fine-tuning; real-world success 21.7% → 75% |
| grounding-sim-to-real-generalization-dexterous-manipulation-vla Grounding Sim-to-Real | arXiv 2603.22876 | Rigorous empirical study across four sim-to-real axes for VLA dexterous manipulation policies |
| twinrl-vla-digital-twin-driven-rl-robotic-manipulation TwinRL-VLA | arXiv 2602.09023 | Smartphone-captured digital twin enables 100% real-world RL success in ~20 minutes per task |
Generated automatically. All entries verified via web search.