MPI 方法重新设计 MoE 路由器：对齐专家主奇异方向

精选理由

MoE 模型的路由器设计长期缺乏理论依据，MPI 给出了可解释的优化方向，做大规模 MoE 训练的团队值得关注，能直接提升模型效率。

AI 摘要

Mixture-of-Experts (MoE) 模型中的路由器负责决定激活哪些专家，但其设计缺乏理论指导。本文提出将每个路由器行与对应专家的主奇异方向对齐，因为该方向能最有效地描述矩阵。基于此，作者设计了 Manifold Power Iteration (MPI) 方法，采用“先幂迭代再收缩”的范式，使路由器行收敛到专家的主奇异方向。实验在 1B 到 11B 参数的 MoE 模型上验证，该方法显著提升了模型效果。

AI 翻译 · 中文

arXiv cs.AIRouter is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated.…

阅读原文