AI模型精选

GLM-5.2 NVFP4格式在vLLM上线,内存减半精度不变

GLM-5.2 in NVFP4 is ready to serve in vLLM 🚀 @NVI…

精选理由

想省显存又不想降精度?GLM-5.2的NVFP4版在vLLM上线了,比FP8省一半内存,推理编码长文本都稳。

AI 摘要

NVIDIA发布GLM-5.2的NVFP4检查点,在Blackwell GPU上相比FP8内存占用降低一半。该模型在推理、编码和长上下文基准测试中保持与FP8相同的准确率。用户可通过vLLM直接加载运行:vllm serve nvidia/GLM-5.2-NVFP4。

AI 翻译 · 中文

NVIDIA发布GLM-5.2的NVFP4检查点,在Blackwell GPU上相比FP8内存占用降低一半。该模型在推理、编码和长上下文基准测试中保持与FP8相同的准确率。用户可通过vLLM直接加载运行:vllm serve nvidia/GLM-5.2-NVFP4。

vLLMGLM-5.2 in NVFP4 is ready to serve in vLLM 🚀 @NVIDIAAI's official NVFP4 checkpoint of GLM-5.2 on Blackwell cuts the memory footprint vs FP8 while matching its accuracy across reasoning, coding, and long-context benchmar