Physics-IQ基准验证与改进

精选理由

DeepMind发布了Physics-IQ验证版，专门评测视频模型对物理世界的理解。现有基准有缺陷，他们修正后让模型排名更可信了。

AI 摘要

该研究系统审计了Physics-IQ视频物理理解基准，发现其提示质量和真实标注存在缺陷。作者提出三项改进措施，包括优化提示与真值、引入样本级评分系统，并应用六种图像到视频生成模型验证。新版Physics-IQ Verified改进了57.6%的样本和34.8%的提示，模型排名变化中度显著（Kendall's τ=0.46）。

AI 翻译 · 中文

arXiv: Google DeepMindVideo generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must…

阅读原文