Claude Opus 4.8 登顶 DeepSWE Bench，效率与可靠性领先

精选理由

Claude Opus 4.8 在 DeepSWE Bench 上以 58% Pass@1 证明了自己是效率与可靠性的标杆，做 AI 编程选型的团队可以把它作为性价比参考。

AI 摘要

Claude Opus 4.8 在 DeepSWE Bench 上取得 58% Pass@1 的成绩，排名第二，仅次于 GPT-5.5。该模型在原始分数上略逊一筹，但在多个最新基准测试中展现出最高的可靠性和效率。这一结果延续了近期趋势：模型在追求极致性能的同时，更注重实际应用中的稳定性和资源效率。对于关注 AI 编程和模型选型的开发者来说，这是一个值得关注的信号。

AI 翻译 · 中文

elvisThe efficiency frontier! Where do you think GPT-5.6 will land? CHOI @arrakis_ai Claude Opus 4.8 has landed on DeepSWE Bench, posting a 58% Pass @1 and taking #2 overall behind GPT-5.5. It continues a broader trend: sligh…

Decoder05-28 21:20原文
shao__meng05-29 00:55原文
岚叔05-29 02:59原文
lmarena.ai05-28 22:15原文
Poe05-28 22:25原文
IT之家05-28 22:52原文
Simon Willison’s Weblog05-28 23:59原文
Anthropic: Newsroom05-29 00:05原文
berryxia05-29 02:07原文
AI Will05-29 02:41原文

查看原推