Mythos 上线：FrontierCode 成为新编程基准，Opus 4.8 和 GPT 5.5 在 FC Diamond 上不随努力扩展

精选理由

Mythos 的 FrontierCode 基准揭示了当前顶级模型在长任务上的扩展瓶颈，做 AI 编程评估或开发长流程自动化的团队值得关注，可以直接在 Devin 中体验。

AI 摘要

Mythos 正式上线，其 FrontierCode 被认定为下一代编程基准。在 FC Diamond 测试中，Opus 4.8 和 GPT 5.5 在随努力扩展方面表现不佳。Mythos/Fable 的后训练方法首次将测试时计算应用于解决超长任务，相当于数十小时人类工作、每任务数百美元。该功能现已在 Cognition 和 Devin 中可用，仅需 1.4x ACUs。

AI 翻译 · 中文

swyx (AI Engineer)Mythos is live! so excited to have our FrontierCode recognized as the next frontier coding bench. on FC Diamond, BOTH Opus 4.8 and GPT 5.5 don't meaningfully scale with effort, which many of you caught yesterday. Mythos/…

berryxia06-10 14:20原文
OpenRouter06-10 19:13原文
歸藏(guizang.ai)06-11 08:12原文
elvis06-11 20:55原文
marktechpost06-13 08:15原文
AI Will06-11 07:28原文
Decoder06-12 17:10原文
宝玉06-14 02:12原文

查看原推