Transformer Geometry Observatory TGO-I: 视觉Transformer的光谱几何研究

精选理由

这篇论文用TGO框架搞清楚了ViT的维度在训练中怎么变化——不是集中而是越来越分散，尤其CLS token最明显，对理解视觉Transformer内部机制很有参考价值。

AI 摘要

研究者提出Transformer Geometry Observatory (TGO) 系统框架，用于探索视觉Transformer的表征几何与动力学。TGO-I聚焦光谱几何，使用ViT-Small/16模型在ImageNet-100上训练，分析有效秩、稳定秩、参与比、光谱熵、光谱平坦度、光谱各向异性等指标。结果发现训练中维度利用率持续增加，各向异性降低，光谱熵和参与比上升，特征谱趋于平坦。与直觉相反，方差在表征维度上再分配，CLS token表征展现出最高有效维度和最低各向异性。

AI 翻译 · 中文

arXiv cs.LGDespite the widespread adoption of Vision Transformers (ViTs) and their success across numerous computer vision applications, the fundamental understanding of their dimensional and representational geometry remains relat…

阅读原文