OpenRouter 持续运行 GPQA 和 TAU-Bench 并公开结果

精选理由

选模型路由工具前，看看 OpenRouter 定期跑的 GPQA 和 TAU-Bench 排名，现在 Parasail 和 Zai 排第一，挺有参考价值。

AI 摘要

OpenRouter 持续对大多数开源权重模型运行 GPQA 与 TAU-Bench 两个基准，并将结果公开。这些成绩被用于其 AutoExacto 元基准，后者是路由工具调用的默认依据。当前 Parasail 和 Zai 在排行榜上位列第一。

AI 翻译 · 中文

OpenRouterTip: OpenRouter continuously runs GPQA and TAU-Bench on most open-weight models and publishes the results publicly. This informs our AutoExacto meta-benchmark, used by default when routing tool calls. Here, @Parasail_io …

查看原推