HExA：通过主动实验让LLM自我改进的框架

精选理由

Claude 4.6在困难物理任务上从2%蹿到77%，全靠HExA这个主动实验框架。不用复杂训练，自己试错学技能，还能跨任务迁移。

AI 摘要

HExA是一个无需训练的上下文学习框架，让LLM通过主动实验设计、迭代优化和技能库复用来解决新颖领域的长时任务。在Interphyre基准（基于PHYRE 2D物理环境）上，Claude Sonnet 4.6原本只有2%的成功率，而HExA将其提升至77%。HExA还优于ReAct和Reflexion等基线，并支持开源模型。仅使用从简单关卡学到的技能转移，HExA在新关卡上也能达到44%成功率，证明技能可复用。

AI 翻译 · 中文

arXiv cs.LGLarge language models (LLMs) are increasingly used to take actions in the real world and support human decision-making, yet most agents rely on parametric knowledge, fixed post-training data, retrieval, or search. This p…

AWS Machine Learning Blog06-29 17:52原文

阅读原文