Together AI 推理引擎：比 Claude Opus 成本低 76%

精选理由

做多智能体编码或高并发推理的团队，终于有基准测试对准真实负载了——Together AI 的引擎在成本和速度上都有明显优势，值得跑一下自己的场景试试。

AI 摘要

Together AI 的 VP of Kernels 指出，当前推理基准测试与生产负载不匹配。针对多并发编码智能体（每个上下文 45k-200k token）的真实场景，Together AI 的推理引擎在 KV 缓存、调度器限制和吞吐量方面进行了优化。测试结果显示，其 TPS 比最快的开源引擎高 31%，饱和状态下首 token 时间快 2 倍，每请求成本比 Claude Opus 4.6 低 76%。这为运行大规模 AI 智能体的团队提供了更高效、更低成本的推理方案。

AI 翻译 · 中文

Together AI"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels When you're running dozens of concurrent coding agents — each with 4…

查看原推