Grad Detect：基于梯度的LLM幻觉检测方法

精选理由

这篇论文教你用梯度信号抓幻觉，比看置信度准得多，而且发现只看最后5层就够了，省算力。

AI 摘要

Grad Detect 是一种通过分析大语言模型推理时逐层梯度模式来检测幻觉的方法。在多个 Q&A 基准（如 TriviaQA、Natural Questions）上，Grad Detect 在幻觉检测和模型弃权预测任务中均优于基于置信度或采样的基线。层消融实验覆盖 11 个模型和 4 种架构，发现最后 5 个层集中了超过 97% 的判别梯度信号，因此可实现高效部署。该方法为评估 LLM 可靠性提供了统一框架，兼具高预测性能和可解释性。

AI 翻译 · 中文

arXiv cs.LGLarge Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet they remain prone to generating hallucinations. Detecting these hallucinations is critical for deploying LLMs reliably in h…

阅读原文