Unlimited OCR：用恒定KV缓存实现长文档转录

精选理由

百度新出的 Unlimted OCR 用了一种叫 R-SWA 的注意力机制，让它处理几十页文档时不会变慢，内存占用也恒定。想做长文档 OCR 的可以试试。

AI 摘要

Unlimited OCR 模型以 DeepSeek OCR 为基线，将所有解码器注意力层替换为 Reference Sliding Window Attention (R-SWA)，使解码过程中 KV 缓存保持恒定，不再随输出长度增长。在标准最大长度 32K 下，Unlimited OCR 可一次性转录数十页文档。相比传统端到端 OCR 模型，Unlimited OCR 解决了长序列中内存和速度下降的问题。R-SWA 是一种通用解析注意力机制，还可应用于 ASR、翻译等任务。代码和权重已在 GitHub 开源。

AI 翻译 · 中文

arXiv: DeepSeekRecently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage t…

berryxia06-22 16:47原文
小互06-24 03:51原文
向阳乔木06-23 00:11原文
Pandaily06-23 08:15原文
AK06-23 18:25原文

阅读原文