Learned Representation-Guided Diffusion Models for Large-Image Generation

To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized do-mains like histopathology and satellite imagery; it is of-ten perform...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Ročník 2024; s. 8532 - 8542
Hlavní autoři:	Graikos, Alexandros, Yellapragada, Srikar, Le, Minh-Quan, Kapse, Saarthak, Prasanna, Prateek, Saltz, Joel, Samaras, Dimitris
Médium:	Konferenční příspěvek Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.06.2024
Témata:	Adaptation models Annotations Computational modeling diffusion models generative models Histopathology Image synthesis remote sensing Training Visualization
ISSN:	1063-6919, 1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized do-mains like histopathology and satellite imagery; it is of-ten performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by as-sembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Gen-erating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. 1 1 Code is available at this link
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Equal contribution.
ISSN:	1063-6919 1063-6919
DOI:	10.1109/CVPR52733.2024.00815