微软Lens：3.8B参数图像模型靠详细标注胜过更大模型

精选理由

做图像生成模型训练或研究的团队，可以借鉴Lens用详细标注替代规模扩张的思路，直接复用其开源代码和权重，能大幅降低训练成本。

AI 摘要

微软研究院推出Lens，一个仅3.8B参数的文本到图像模型，在基准测试中匹配更大模型，训练成本大幅降低。其关键创新是使用GPT-4.1生成的8亿条详细图像描述，而非模糊的网页替代文本。代码和权重已开源。这表明高质量标注比模型规模更重要。

AI 翻译 · 中文

DecoderMicrosoft Research presents Lens, a text-to-image model with just 3.8 billion parameters that matches much larger rivals on benchmarks, at a fraction of the training cost. The secret sauce: 800 million detailed image cap…

阅读原文