Perplexity 开源高效 Unigram 分词器，CPU 利用率降低 5-6 倍

精选理由

Perplexity 把生产级分词器开源了，CPU 利用率降 5-6 倍，做推理优化的团队可以直接拿来用，减少延迟瓶颈。

AI 摘要

Perplexity 开源了其生产环境中使用的 Unigram 分词器，相比 HuggingFace 和 SentencePiece 效率更高。该分词器将 CPU 利用率降低了 5-6 倍，解决了 GPU 上运行的小型重排序器和嵌入器因 CPU 分词延迟而成为瓶颈的问题。项目已在 GitHub 上开源，旨在优化推理管道的端到端延迟。

AI 翻译 · 中文

Aravind SrinivasEvery millisecond matters. We’re open sourcing the tokenizer we built and deployed on production; that’s far efficient than huggingface and sentencepiece. Perplexity @perplexity_ai We're open-sourcing the Unigram tok…

查看原推