Dan Fu在斯坦福CS336客座讲座：KV缓存、Megakernels与Parcae缩放定律

精选理由

Dan Fu讲KV缓存和Parcae新缩放定律

AI 摘要

Dan Fu在斯坦福CS336课程中讲解了推理时的KV缓存、prefill/decode分离技术，以及大规模推理的架构。他介绍了Megakernels，通过融合GPU操作实现接近光速的LLM解码。还讨论了Parcae，解释了循环Transformer的扩展问题及其修复方法，并提出了新的缩放定律，暗示现有方法可能未充分利用智能潜力。

AI 翻译 · 中文

Together AI@realDanFu guest lectured in @percyliang's CS336 at Stanford, check out what he covered: → The life of a token: KV cache, prefill/decode disaggregation, and what inference looks like at scale → Megakernels: fusing GPU op…

查看原推