百度开源Unlimited OCR：3B参数、单次处理40+页，表格解析SOTA

精选理由

百度开源了Unlimited OCR，3B参数却只有500M激活，表格解析超强，能一次性读完40页文档，比PaddleOCR-VL-1.6强在表格和阅读顺序上。试试看？

AI 摘要

Unlimited OCR是百度开源的OCR模型，总参数量3B，仅500M激活。它在表格解析和阅读顺序方面表现优秀，在OmniDocBench v1.5和v1.6上达到SOTA。核心创新是Reference Sliding Window Attention（R-SWA），能保持恒定KV缓存大小，单次前向传递处理40+页文档。与PaddleOCR-VL-1.6对比显示，它在语义格式和图表方面略有不足。

AI 翻译 · 中文

Jerry LiuUnlimited OCR is a great model on table parsing and understanding proper reading order. However it does struggle a little on semantic formatting, charts (it does decent at bounding boxes). Attaching the results below to …

marktechpost06-25 05:39原文
向阳乔木06-23 00:10原文
小互06-24 03:51原文
berryxia06-23 02:33原文
IT之家06-25 07:42原文
Pandaily06-23 08:15原文
AK06-23 18:25原文

查看原推