Step 3.7 Flash 登顶 Claw-Eval General 第二

精选理由

StepFun 的 Step 3.7 Flash 在智能体基准 Claw-Eval General 排第二，仅次于 Claude Opus 4.6，多步执行和长程任务都强，感兴趣可以看看。

AI 摘要

StepFun 的 Step 3.7 Flash 模型在 Claw-Eval General 基准测试中取得第二名的成绩，该基准用于评估自主智能体。模型在多步执行和长程任务鲁棒性上表现强劲，排名仅次于 Claude Opus 4.6。这一结果显示其在真实世界智能体工作负载中的潜力。

AI 翻译 · 中文

阶跃星辰 StepfunStep 3.7 Flash hits #2 on Claw-Eval General for autonomous agents. We’re seeing strong performance across multi-step execution and robustness in long-horizon tasks, ranking just behind Claude Opus 4.6. Promising signals …

查看原推