白宫报告发现Claude（Fable）越狱测试：拒绝审查但同意修复代码

精选理由

白宫测了Anthropic的Claude（代号Fable），发现它不帮你找漏洞但愿意直接修代码。安全专家说这反而是正常防御，挺反直觉的。

AI 摘要

白宫发布关于Anthropic模型Fable（即Claude）的越狱测试报告。网络安全专家Katie Moussouris指出，当被要求“审查代码的安全问题”时，Fable拒绝执行，但改为“修复此代码”的指令后，模型反而配合完成。Moussouris认为这只是模型按预期工作的安全防御行为。该事件凸显了AI安全测试中提示词工程的重要性。

AI 翻译 · 中文

Simon Willison’s WeblogKatie Moussouris, a cybersecurity expert and the CEO of Luta Security, told me that Anthropic shared with her a copy of the White House’s report on the Fable jailbreak to get her appraisal. (She said that she is not bein…

@koltregaskes06-16 19:39原文
AI Will06-15 02:59原文
DavidSacks06-15 15:03原文
kimmonismus06-16 07:31原文
Aadit Sheth06-17 19:22原文
Decoder06-14 08:35原文
shao__meng06-14 12:38原文
The Rundown AI06-15 16:31原文
AlphaSignal06-15 19:15原文
IT之家06-15 23:49原文

阅读原文