6月15日
11:12
11:12arXiv cs.AI@Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Jon Crall, Tommaso Cerruti, Yanan Long, Yifan Mai, Sanchit Ahuja, Asaf Yehudai, Marek Šuppa, John P. Lalor, Oluwagbemike Olowe, Jatin Ganhotra, Brian H. Hu, Eliya Habba, Andrew M. Bean, Chang Liu, Sander Land, Steven Dillmann, Aniketh Garikaparthi, Elron Bandel, Saki Imai, James Edgell, Wm. Matthew Kennedy, Jenny Chim, Patrick Meusling, Asteria Kaeberlein, Venkata Ramachandra Karthik Chundi, Manasi Patwardhan, Martin Ku, Austin Meek, Leon Knauer, Brian Wingenroth, Srishti Yadav, Usman Gohar, Felix Friedrich, Michelle Lin, Jennifer Mickel, Arman Cohan, Stella Biderman, Irene Solaiman, Zeerak Talat, Anka Reuel, Mubashara Akhtar, Gjergji Kasneci, Avijit Ghosh, Leshem Choshen
论文提出Every Eval Ever,首个共享元数据模式和社区众包仓库,用于标准化AI评估结果。该模式将评估表示统一为单一JSON文档,支持从评价工具、论文等多种来源导入,并可存储每个实例的输出以进行细粒度分析。当前社区数据库已包含22,235个模型、2,273个独特基准和31种评估格式。论文还提供了自动转换器,从流行格式和评价工具转换到统一模式。
推荐理由:统一了AI评估结果格式