Google LEAP：通用LLM在12道Putnam 2025题上全解，Lean-IMO-Bench提升至70%

精选理由

做数学推理或智能体开发的团队值得关注——LEAP 用通用模型+验证反馈循环就超越了专用系统，说明智能体框架设计比模型本身更关键，建议点开论文看具体架构。

AI 摘要

Google 发布新研究 LEAP（Lean-Enhanced Agentic Programming），通过将通用大语言模型封装在智能体框架中，每一步都基于 Lean 编译器进行验证，并迭代利用验证器反馈。该框架使同一个通用模型解决了全部 12 道 Putnam 2025 数学竞赛题，并将 Lean-IMO-Bench 的一次性求解率从不到 10% 提升至 70%，超越了得分为 48% 的专用金牌系统。这项研究展示了定制智能体框架在数学推理任务上的巨大潜力，论文已发布在 arXiv 上。

AI 翻译 · 中文

elvisNew research from Google. Just shows the impressive results you can get from custom agent harnesses. LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates again…

查看原推