Applied Sciences, Vol. 14, Pages 3338: Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models

1 month ago 28

Applied Sciences, Vol. 14, Pages 3338: Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion Models

Applied Sciences doi: 10.3390/app14083338

Authors: Guangzi Zhang Yulin Qian Juntao Deng Xingquan Cai

Diffusion models are widely recognized in image generation for their ability to produce high-quality images from text prompts. As the demand for customized models grows, various methods have emerged to capture appearance features. However, the exploration of relations between entities, another crucial aspect of images, has been limited. This study focuses on enabling models to capture and generate high-level semantic images with specific relation concepts, which is a challenging task. To this end, we introduce the Inv-ReVersion framework, which uses inverse relations text expansion to separate the feature fusion of multiple entities in images. Additionally, we employ a weighted contrastive loss to emphasize part of speech, helping the model learn more abstract relation concepts. We also propose a high-frequency suppressor to reduce the time spent on learning low-frequency details, enhancing the model’s ability to generate image relations. Compared to existing baselines, our approach can more accurately generate relation concepts between entities without additional computational costs, especially in capturing abstract relation concepts.

Read Entire Article