LLawCo: 学习合作法则以建模具身多智能体行为

LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior

精选理由

多智能体容易各说各话?LLawCo让它们自己学会“必要时说话”“等待伙伴”,在PARTNR-Dialog和TDW-MAT上成功率都涨了4-7个百分点,挺实在的。

AI 摘要

LLawCo框架让具身智能体通过反思失败提取行为模式,推导出“必要时说话”“等待伙伴”等高层法则,经监督微调融入思维链。在PARTNR-Dialog基准上,使用四个骨干LLM(如Llama、Mistral)平均成功率提升4.5%,在TDW-MAT基准上提升6.8%。该框架显著提升多智能体合作效率与任务成功率,优于现有开源通信框架。

AI 翻译 · 中文

LLawCo框架让具身智能体通过反思失败提取行为模式,推导出“必要时说话”“等待伙伴”等高层法则,经监督微调融入思维链。在PARTNR-Dialog基准上,使用四个骨干LLM(如Llama、Mistral)平均成功率提升4.5%,在TDW-MAT基准上提升6.8%。该框架显著提升多智能体合作效率与任务成功率,优于现有开源通信框架。

arXiv cs.AIEmbodied agents operating in decentralized and partially observable environments have attracted growing attention in recent years. However, existing large language model (LLM)-based agents often exhibit behaviors that ar