Claude Fable 5 登顶 Agent Arena 排行榜

精选理由

做 AI 智能体开发的团队终于有了真实任务驱动的评测基准——Fable 5 在 30 万任务中碾压对手，值得关注其强执行与弱引导的权衡。

AI 摘要

Claude Fable 5 在全新 Agent Arena 排行榜上以最大优势超越 Opus-4.8 和 GPT-5.5，排名第一。该排行榜基于 30 万+真实任务、200 万+工具调用和 4000 万行代码评估，衡量模型在任务成功率、用户表扬/抱怨比等关键信号上的表现。Fable 5 在可完成任务上表现极佳，但可引导性较弱。Agent Arena 提供网页搜索、文件系统和终端工具，让模型完成编写代码、制作幻灯片、研究网页等复杂工作流。

AI 翻译 · 中文

lmarena.aiExciting news: Claude Fable 5 ranks #1 on the new Agent Arena leaderboard! Fable 5 leads by the widest margin ever over Opus-4.8 and GPT-5.5 on two key signals: confirmed task success rate and praise vs. complaint, despi…

AI Will06-11 07:28原文
Alex Albert06-09 17:09原文
宝玉06-09 17:22原文
rohanpaul_ai06-09 17:53原文
Decoder06-09 18:25原文
berryxia06-09 22:47原文
Artificial Analysis06-12 04:45原文
The Rundown AI06-09 17:09原文
OpenRouter06-09 17:13原文
elvis06-09 17:17原文

查看原推