Google 发布 Gemma 4 12B：首个原生音频输入的中型模型

精选理由

Gemma 4 12B 让中小团队也能用上原生音频多模态模型，16GB 内存门槛极低，做语音交互或视觉应用的开发者可以直接下载试试。

AI 摘要

Google 发布了 Gemma 4 12B，这是其首个支持原生音频输入的中型多模态模型。该模型采用无编码器架构，直接将视觉和音频信息融入大语言模型，仅需 16GB 内存即可运行。在基准测试中，其性能接近 26B 参数模型，且采用 Apache 2.0 开源许可。这标志着中小型模型在多模态能力上的重要突破，尤其适合资源受限的开发者。

AI 翻译 · 中文

Philipp SchmidWe just launched a Gemma 4 12B! Our first mid-sized model with native audio inputs. Gemma 4 12 B is a unified, encoder-free multimodal model. 🧠 vision and audio directly into the LLM. 💻 Just need 16GB of memory. 📊 Ben…

Decoder06-03 19:54原文
Google AI Developers06-03 16:07原文
berryxia06-04 00:22原文
ollama06-04 23:34原文
Demis Hassabis06-03 18:35原文
Sundar Pichai06-03 19:36原文
小互06-04 00:22原文
Patrick Loeber06-03 16:34原文
AI Breakfast06-05 15:03原文
IT之家06-04 03:12原文

查看原推