Ambient Diffusion Policy：从次优数据中模仿学习的机器人新方法

精选理由

机器人团队终于有了一个能高效利用次优数据的实用方法——Ambient Diffusion Policy 解决了低质量数据难以训练的问题，做机器人模仿学习的开发者可以直接在现有数据集上尝试，有望大幅降低数据收集成本。

AI 摘要

机器人领域的高质量任务数据昂贵且难以收集，而次优数据（低质量或分布外演示）却大量存在。现有方法在同时训练两类数据时，常无法区分次优样本中的有用和有害特征。Ambient Diffusion Policy 通过引入噪声依赖的数据使用策略，仅在高和低扩散时间步利用次优数据，从而提取有用特征。该方法基于机器人动作数据的频谱幂律分布，利用全局到局部层次和局部性两个性质。在六项任务上，针对四种次优数据（噪声轨迹、仿真到现实差距、任务不匹配、大规模数据混合），该方法均有效，并在 Open X-Embodiment 数据集上比现有方法提升高达33%。

原文 · arXiv cs.AI

Ambient Diffusion Policy: Imitation Learning from Suboptimal Data in Robotics

We propose Ambient Diffusion Policy, a simple and principled method for imitation learning from suboptimal data in robotics. High-quality, task-specific robot data is expensive and time-consuming to collect, while suboptimal datasets with lower-quality or out-of-distribution demonstrations are abundant. Existing methods that co-train on both data sources in robotics often fail to separate the meaningful and the harmful features in the suboptimal samples. In contrast, our method extracts only the useful features by introducing a new axis to co-training in robotics: noise-dependent data usage. Ambient Diffusion Policy restricts the contribution of suboptimal data during training to only the high and low diffusion times. To rigorously justify our approach, we first observe that robot action data exhibits a spectral power law. This induces two important properties on the optimal Diffusion Policy that we exploit: a global-to-local hierarchy and locality. We theoretically formalize this discussion using a simplified model. Our experiments validate Ambient Diffusion Policy on four types of suboptimal action data (noisy trajectories, sim-to-real gap, task mismatch, and large-scale data mixtures) across six tasks. The results show that it effectively learns from arbitrary sources of suboptimal data. Notably, it outperforms existing co-training baselines by up to 33% when scaled to Open X-Embodiment - a large dataset with heterogeneous data quality and unstructured distribution shifts. Overall, Ambient Diffusion Policy increases the utility of suboptimal demonstrations and expands the set of usable data sources in robotics.

阅读原文