PluRule：评估AI在多元社区内容审核能力的基准

精选理由

内容审核从业者和社区运营团队会关心：现有AI模型在多元规则下表现堪忧，PluRule为评估和提升审核系统提供了关键基准，值得深入研究。

AI 摘要

社交媒体正走向多元化，不同社区有各自的规则。研究者提出了PluRule基准，包含来自1989个Reddit社区的13371条规则违规案例，覆盖9种语言。测试发现，即使是GPT-5.2等先进模型，在识别违规内容时表现也仅略优于简单基线。增加模型规模和上下文信息带来的提升有限，而通用规则（如文明用语）更容易被检测。这表明，AI在多元社区的内容审核仍面临根本性挑战。

原文 · arXiv cs.AI

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

Social media are shifting towards pluralism -- community-governed platforms where groups define their own norms. What violates rules in one community may be perfectly acceptable in another. Can AI models help moderate such pluralistic communities? We formalize the task as a multiple-choice problem, mirroring how human moderators operate in the real world: given a comment and its surrounding context, identify which specific rule, if any, is violated. We introduce PluRule, a multimodal, multilingual benchmark for detecting 13,371 rule violations across 1,989 Reddit communities spanning 2,885 rules in 9 languages. Using this benchmark, we show that state-of-the-art vision-language models struggle significantly: even GPT-5.2 with high reasoning performs only slightly better than a trivial baseline. We also find that bigger models and increased context provide marginal gains, and universal rules like civility and self-promotion are easier to detect. Our results show that moderation of pluralistic communities on social media is a fundamental challenge for language models. Our code and benchmark are publicly available.

阅读原文