Semantic Browsing: 通过文本级语义控制实现图像生成的可控多样性

精选理由

想要生成同一主题下不同设计的图像？这篇论文教你用VLM在文本层面控制多样性，比随机抽噪声靠谱多了。

AI 摘要

该论文提出一种名为Semantic Browsing的方法，解决文本到图像模型生成样本多样性不足的问题。传统方法依赖随机噪声产生无意义变化，而Semantic Browsing通过Vision Language Model（VLM）在文本层面施加结构化语义变异。用户可沿可解释的语义轴（如物体属性、场景布局）导航图像集，每个变体对应一个具体可理解的语义决策。实验表明该方法能生成多样且可浏览的设计空间。

AI 翻译 · 中文

arXiv cs.AIModern text-to-image models excel in visual fidelity and prompt adherence. However, this strict adherence comes at the cost of diversity: generated samples tend to collapse into a single visual interpretation. Existing m…

阅读原文