Summary

This paper reveals that large VLA models (e.g., π₀ and GR00T-N1.5) exhibit severe layer-wise representational redundancy despite being trained on diverse physical trajectories. The authors introduce a training-free structural compression pipeline using Centered Kernel Alignment (CKA) to identify and permanently remove redundant twin layers, cutting model depth by up to 50%.

Key Contributions

  • Identifies widespread layer-wise redundancy in state-of-the-art VLA models via CKA analysis
  • Proposes a training-free structural compression pipeline requiring only a single forward pass
  • Achieves 40–50% reduction in training time and up to 30% faster real-time inference
  • Matches or exceeds base model performance after compression, validated on downstream manipulation tasks

Significance

Demonstrates that current VLA architectures are over-parameterized for fine-tuning, offering a compute-efficient pathway to deploy billion-parameter robot policies on resource-constrained hardware.