新基准SusVibes揭露编程代理安全秘密

精选理由

卡内基梅隆的测试发现，编程代理写代码10个里只有1个安全。别信AI代码，一定要做安全审查。

AI 摘要

卡内基梅隆大学构建SusVibes基准，包含200个真实编程任务，每个任务来自历史上人类曾引入漏洞的开源项目。SWE-Agent（Claude 4 Sonnet）通过功能测试61%，但仅10.5%的解决方案安全，超过80%的工作代码含有漏洞。尝试添加安全警告、让代理识别弱点、揭示漏洞类型三种修复，安全改善甚微，功能准确度下降7个百分点。

AI 翻译 · 中文

AlphaSignalA new benchmark just exposed the dirty secret behind every coding agent. Millions of developers now let AI agents write entire features unsupervised. A Carnegie Mellon paper tested whether that code is safe to ship. T…

查看原推