AA-Briefcase基准测试：Claude Fable 5领先但任务成本$31，DeepSeek V4 Flash仅$0.04

精选理由

新基准AA-Briefcase测长期项目，Claude Fable 5最强但贵，DeepSeek V4 Flash极便宜，GLM-5.2性价比超赞。

AI 摘要

AA-Briefcase基准测试评估模型在长期知识工作项目中的表现，任务成本差异达800倍。Claude Fable 5以1587 Elo领先，但平均任务成本31美元；Claude Opus 4.8得分1356，成本10.40美元。DeepSeek V4 Flash仅需约0.04美元，性价比最高。GLM-5.2得分1266，成本2.40美元，得分仅低Claude Opus 4.8不到90 Elo，成本不到其25%。

AI 翻译 · 中文

Clement Delangue“Cost per task varies by ~800x across models tested: Claude Fable 5 leads the benchmark but costs more than $31 per task on average, compared to ~$0.04 for DeepSeek V4 Flash (max). The strongest price/performance options…

IT之家06-20 00:04原文
lmarena.ai06-17 20:21原文
Pandaily06-20 08:43原文
shao__meng06-18 00:58原文
@atomic_chat_hq06-18 05:00原文
@koltregaskes06-18 18:17原文
Epoch AI06-19 06:54原文
Together AI06-19 10:40原文
Aadit Sheth06-17 19:22原文
Andrew Ng06-19 18:34原文

查看原推