Democratic ICAI：辩论式方法从偏好中提取原则

精选理由

这篇论文用辩论方式来搞AI对齐，比单次解释更细致，在创意任务上预测偏好更准，搞对齐研究的值得看看。

AI 摘要

Democratic ICAI 通过结构化角色辩论收集多种竞争性理由，用于从人类偏好中提取自然语言原则。在创意偏好基准 MuCE-Pref 和 LiTBench 上，该方法在多种创意任务类别中提高了偏好预测准确性。与 deliberative prompting 和基于原则的基线相比，Democratic ICAI 产生了更忠实的偏好结构。LLM 标注者更偏好其生成的宪法。

AI 翻译 · 中文

arXiv cs.LGPreference-based alignment often struggles to capture the reasoning that underlies human judgments. Many evaluations rely on multiple interacting criteria, yet pairwise labels reveal only the final choice rather than the…

阅读原文