Theoria：基于非正式推理状态的改写接受性验证

精选理由

Theoria 能审计 AI 回答的每一步推理，HLE 上精度 91.4%，比纯 LLM 评判更可追溯，专治隐藏前提和伪造引用。

AI 摘要

Theoria 是一种验证架构，将候选解答重写为一系列类型化状态转换，每步转换需提供明确理由（引用、计算或给定事实），且可独立审计。在 HLE-Verified Gold（185 道文本专家题）上，Theoria 认证了 105 题，严格精度达 91.4%（Wilson 95% CI [84.5%, 95.4%]）。与整体 LLM 评判者相比，两者错误分布差异大（Jaccard 0.14-0.36），可互补。在 95 个对抗性有毒证明上，结构评判者捕获 94.7%，高于整体评判的 83.2%（p=0.0017）。在 GPQA Diamond（n=65）上，认证精度为 97.1%。

AI 翻译 · 中文

arXiv cs.AIWhen should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited a…

vLLM07-02 12:29原文
OpenRouter06-30 23:18原文

阅读原文