12:13
arXiv cs.AI@Negin Raoof, Richard Zhuang, Marianna Nezhurina, Etash Guha, Atula Tejaswi, Ryan Marten, Charlie F. Ruan, Tyler Griggs, Alexander Glenn Shaw, Hritik Bansal, E. Kelly Buchanan, Artem Gazizov, Reinhard Heckel, Chinmay Hegde, Sankalp Jajee, Daanish Khazi, Emmanouil Koukoumidis, Xiangyi Li, Hange Liu, Shlok Natarajan, Harsh Raj, Nicholas Roberts, Ethan Shen, Nishad Singhi, Michael Siu, Ashima Suvarna, Hanwen Xing, Patrick Yubeaton, Robert Zhang, Leon Liangyu Chen, Xiaokun Chen, Steven Dillmann, Saadia Gabriel, Xunyi Jiang, Anurag Kashyap, Boxuan Li, Yein Park, Minh Pham, Sujay Sanghavi, Lin Shi, Ke Sun, Yixin Wang, Zhiwei Xu, Erica Zhang, Siyan Zhao, Wanjia Zhao, Jenia Jitsev, Alex Dimakis, Benjamin Feuer, Ludwig Schmidt OpenThoughts-Agent项目提出一个完全开源的数据整理流程,用于训练通用智能体模型。研究团队进行超过100次对照实验,系统分析了数据来源和多样性的重要性。基于该流程构建了10万样本的训练集,微调Qwen3-32B模型后,在7个智能体基准上平均准确率达44.8%,比最强开源模型Nemotron-Terminal-32B(40.9%)提升3.9个百分点。该训练集在计算量可控的对比中表现出强扩展性,所有数据、管道和模型已在openthoughts.ai开源。
推荐理由:想自己训练智能体模型?这里有开源的数据配方和100次实验的经验,帮你少走弯路。