[Classifier-Guided-Expand] More Control for Free! Image Synthesis with Semantic Diffusion Guidance

过去的text-to-image生成方法需要image-caption对进行训练，无法用在没有text annotation的数据集上本文用一个统一的框架，可以选择用reference image / language / language + image指导图像生成模型。

emergency_rose

338人浏览 · 2024-07-03 18:35:48

emergency_rose · 2024-07-03 18:35:48 发布

1、目的

过去的text-to-image生成方法需要image-caption对进行训练，无法用在没有text annotation的数据集上

本文用一个统一的框架，可以选择用reference image / language / language + image指导图像生成模型

2、方法

Semantic Diffusion Guidance (SDG)

1) 无需重新训练无条件DDPM，只需要训练CLIP finetune

-> 将BN层替换为adaptive BN层，以时间t作为condition

-> 自监督（contrastive objective， $E_{I}(x_{0})$ 和 $\widetilde{E_{I}}(x_{t})$ ，其中 $E_{I}$ 参数固定， $\widetilde{E_{I}}$ 在噪声图像上finetune），无须text annotations

2) guidance

$E_{I}^{'}$ 是用额外的timestep input上的噪声图像训练的image encoder

-> language guidance

$E_{L}$ 是text encoder

用finetune过的CLIP预测image-text matching score

-> image guidance

content：

如果需要生成的图片和参考图片有相似的结构，可以用

style

-> multimodal guidance

EazyDevelop社区

一站式 AI 云服务平台

更多推荐

cover

国内低代码平台：2025 年国内主流平台盘点

EazyDevelop社区

cover

从零开始搭建个人RAG知识库：RAGFlow+DeepSeek保姆级教程！

EazyDevelop社区

cover

5分钟搞定！MySQL/PostgreSQL 到 Elasticsearch 的实时同步

EazyDevelop社区

所有评论(0)

查看更多评论

emergency_rose

@sinat_30618203

已为社区贡献1条内容