Mythos 多项指标超越 GPT-5.5，安全威胁引担忧

精选理由

Mythos 在编程和网络安全基准上碾压 GPT-5.5，做 AI 安全或模型评估的团队需要关注其潜在威胁，建议提前加固防御。

AI 摘要

Gary Marcus 引用 scaling01 观点，认为 Mythos 在多项基准测试中优于 GPT-5.5，包括 SWE-bench Pro（77.8% vs 58.6%）、HLE（56.8% vs 41.4%）和网络安全测试。Mythos 在漏洞利用方面表现更强，能更高效地发现安全漏洞，但这也带来严重安全隐患。Marcus 警告，若 Mythos 完全发布，将对未充分防御的现实系统造成巨大混乱。目前最大的未知是 Mythos 在开放真实世界问题中的表现。

AI 翻译 · 中文

Gary Marcus1. Agreed w @scaling01 that Mythos appears to be better GPT 5.5 on many metrics. 2. Mythos is definitely a major wakeup call wrt security, and will pose problems for real-world systems that aren’t well-defended. As @scal…

查看原推