DOPD：双重同策略蒸馏方法提出，解决特权幻觉问题

精选理由

这篇论文提出了一种新蒸馏方法DOPD，通过分令牌监督解决特权幻觉，在LLM和VLM上效果都更好，适合关注模型压缩的研究者。

AI 摘要

DOPD是一种advantage-aware的双重蒸馏范式，通过动态路由令牌级监督信号，在特权教师和特权学生策略之间进行分配，缓解了传统同策略蒸馏中的特权幻觉问题。实验在LLM（如GPT-2）和VLM（如CLIP）上验证，结果显示DOPD在稳定性和鲁棒性等指标上持续优于Vanilla OPD。

AI 翻译 · 中文

arXiv cs.AIOn-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance f…

阅读原文