NVIDIA 发布 Gated DeltaNet-2：线性注意力层解耦擦除与写入

精选理由

NVIDIA 新线性注意力，解耦擦写门

AI 摘要

NVIDIA 发布 Gated DeltaNet-2，一种线性注意力层，将 Delta 规则中的擦除和写入操作解耦为通道级擦除门 b_t 和写入门 w_t。在 1.3B 参数、100B FineWeb-Edu 令牌训练下，它在语言建模、常识推理和长上下文检索任务上超越 Mamba-2、Gated DeltaNet、KDA 和 Mamba-3。最大提升出现在 RULER S-NIAH 和多键针检索基准上。

AI 翻译 · 中文

marktechpostLinear attention squeezes the unbounded KV cache into a fixed-size recurrent state, but editing that memory without scrambling existing associations is hard. Prior delta-rule models like Gated DeltaNet and KDA use one sc…

NVIDIA AI05-22 20:21原文
Hugging Face: Blog05-23 00:02原文
Decoder05-24 08:51原文
阿里云 Alibaba Cloud05-25 12:22原文

阅读原文