GLM-5.2 NVFP4格式在vLLM上线，内存减半精度不变

精选理由

想省显存又不想降精度？GLM-5.2的NVFP4版在vLLM上线了，比FP8省一半内存，推理编码长文本都稳。

AI 摘要

NVIDIA发布GLM-5.2的NVFP4检查点，在Blackwell GPU上相比FP8内存占用降低一半。该模型在推理、编码和长上下文基准测试中保持与FP8相同的准确率。用户可通过vLLM直接加载运行：vllm serve nvidia/GLM-5.2-NVFP4。

AI 翻译 · 中文

vLLMGLM-5.2 in NVFP4 is ready to serve in vLLM 🚀 @NVIDIAAI's official NVFP4 checkpoint of GLM-5.2 on Blackwell cuts the memory footprint vs FP8 while matching its accuracy across reasoning, coding, and long-context benchmar…

LMSYS Org (SGLang)06-27 13:12原文
Geek06-26 07:42原文
AWS Machine Learning Blog06-25 16:41原文
IT之家01:37原文

查看原推