AtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World Models

Summary

AtomVLA is the first subtask-aware VLA framework paired with a scalable offline post-training pipeline. It addresses the “instruction grounding gap” in VLA models — the absence of explicit intermediate guidance that leads to compounding errors in long-horizon tasks — by decomposing tasks into atomic subtasks guided by predictive latent world models during post-training.

Key Contributions

Subtask decomposition approach that bridges the instruction gap between high-level language commands and low-level actions
Scalable offline post-training pipeline that leverages predictive latent world models to generate intermediate supervision
Reduces compounding errors in long-horizon multi-step manipulation tasks
Demonstrated improvements on standard benchmarks without requiring online environment interaction during post-training

Significance

Demonstrates that scalable offline post-training with structured subtask supervision can substantially improve VLA performance on complex tasks, without the cost and complexity of online RL.

Embodied Robotics Research

Explorer

AtomVLA: Scalable Post-Training for Robotic Manipulation via Predictive Latent World Models

Summary

Key Contributions

Significance

Links

Graph View

Table of Contents

Backlinks