TokenPilot: 面向LLM智能体的缓存高效上下文管理框架

精选理由

想降低LLM智能体长会话的推理成本？看看TokenPilot，它通过智能管理上下文缓存，在三个基准上省了61%-87%的费用，性能还不掉队。

AI 摘要

TokenPilot提出了一种双粒度上下文管理框架，通过Ingestion-Aware Compaction稳定提示前缀并消除环境噪声，以及Lifecycle-Aware Eviction监控上下文段残余效用。在PinchBench和Claw-Eval基准测试中，TokenPilot在孤立模式下分别降低61%和56%的成本，连续模式下降低61%和87%，同时保持与先前系统相当的性能。该框架已集成到LightMem2中，可访问https://github.com/zjunlp/LightMem2。

AI 翻译 · 中文

arXiv cs.LGAs LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their uncons…

阅读原文