Claude Code + GLM-5.2在211项真实工程任务评测中击败Opus 4.8和GPT-5.5

精选理由

Fireworks和Faros拿真实工程任务实测GLM-5.2，结果比Opus 4.8和GPT-5.5都更便宜更快，得分还高。想为代码任务选模型可以看看这个。

AI 摘要

Fireworks与Faros_AI联合对211个真实软件工程任务进行了评估。Claude Code搭配GLM-5.2的Judge得分0.568，每任务耗时321秒，成本0.92美元。对比组Claude Code + Opus 4.8得分为0.521、耗时775秒、成本1.76美元；Codex + GPT-5.5得分为0.466、耗时392秒、成本2.06美元。评测基于Faros自有代码库而非公开基准，更贴近实际开发场景。

AI 翻译 · 中文

Fireworks AIIn a joint Fireworks and @Faros_AI evaluation of 211 real engineering tasks, Claude Code + GLM-5.2 beat both Claude Code + Opus 4.8 and Codex + GPT-5.5: - Judge score: 0.568 vs. 0.521 and 0.466 - Time per task: 321s vs. …

查看原推