FrontierMath v2 上线：GPT-5.5 和 Google AI 领跑

精选理由

数学基准更新，GPT-5.5和Google AI成绩亮眼

AI 摘要

Epoch AI 发布 FrontierMath 基准测试 v2 版本，修复了 42% 的问题错误。新版本中，GPT-5.5 (xhigh) 在 Tier 1-3 上取得 85% 的准确率，Google 的 AI co-mathematician 在 Tier 4 上达到 76%。所有模型得分普遍提高，排名基本不变。

AI 翻译 · 中文

Epoch AIFrontierMath: Tiers 1–4 (v2) is live. We concluded an audit that addressed errors in 42% of problems. Rankings are similar but scores are higher across the board. The current leaders are GPT-5.5 (xhigh) with 85% on Tiers…

查看原推