Google DeepMind 发布 DiffusionGemma，并行生成 256 tokens，速度提升 4 倍

精选理由

文本扩散模型把生成速度拉到新高度，做代码补全或实时编辑的开发者可以直接在 NVIDIA 端点试跑，感受并行 token 的爽感。

AI 摘要

Google DeepMind 推出实验性开源模型 DiffusionGemma，采用文本扩散技术，每步并行生成 256 个 token，推理速度可达 150+ TPS（DGX Spark）或 1000+ TPS（单张 H100）。该模型激活仅 3.8B 参数，量化后可在 24GB VRAM 消费级 GPU 上运行，适合代码填充、内联编辑等非线性任务。NVIDIA 从首日起提供 BF16/NVFP4 检查点、免费 GPU 加速端点及 vLLM 支持。DiffusionGemma 优先速度而非极致质量，生产场景仍推荐标准 Gemma 4。

AI 翻译 · 中文

NVIDIA AICongrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one wi…

rohanpaul_ai06-10 18:00原文
Decoder06-10 19:20原文
vLLM06-12 04:10原文
Simon Willison’s Weblog06-10 20:00原文
IT之家06-10 22:53原文
小互06-11 02:34原文
karminski-牙医 (AI工具)06-12 04:31原文
shao__meng06-10 01:20原文
marktechpost06-11 08:33原文
Richard Socher06-11 15:30原文

查看原推