程序合成解释Transformer注意力机制

精选理由

这篇论文用Python程序解释了注意力头怎么工作，还能直接用程序替换掉原始头，精度很高，想看模型内部机制的可以读。

AI 摘要

研究者提出用程序合成方法反向工程Transformer注意力头。他们先计算注意力矩阵，再让预训练语言模型生成Python程序来重现注意力模式。在GPT-2、TinyLlama-1.1B和Llama-3B上，不到1000个程序实现了平均IoU>75%。替换25%的注意力头仅导致16%的困惑度增加，并在下游问答基准上保持性能。

AI 翻译 · 中文

arXiv cs.LGA longstanding goal of research on interpretable deep learning is to replace opaque neural computations with human-meaningful symbolic descriptions. In this paper, we propose an approach for approximating the behavior of…

阅读原文