Agent Arena发布token效率分析：Opus与Fable表现突出

精选理由

想找token性价比高的模型？Agent Arena告诉你Opus和Fable有多能打，GPT-5.5也很省token。

AI 摘要

Agent Arena通过代码编写、幻灯片制作等真实任务评估模型性能。Opus 4.8 Thinking每会话消耗较少token，质量提升+9.2%；Fable达到+14.1%的最高质量。GPT-5.5系列模型（+6.2%至+8.6%）以更少token超越前沿。Gemini-3.5 Flash消耗token最多但效果不佳，Grok Build 0.1消耗20K+ token却出现负提升。

AI 翻译 · 中文

lmarena.ai[Token efficiency in Agent Arena] Agent Arena measures agent performance across a range of real-world tasks from our global community. Models get search, filesystem, and terminal tools to complete complex workflows: writ…

查看原推