Learned Representation-Guided Diffusion Models for Large-Image Generation

To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized do-mains like histopathology and satellite imagery; it is of-ten perform...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Ročník 2024; s. 8532 - 8542
Hlavní autori: Graikos, Alexandros, Yellapragada, Srikar, Le, Minh-Quan, Kapse, Saarthak, Prasanna, Prateek, Saltz, Joel, Samaras, Dimitris
Médium: Konferenčný príspevok.. Journal Article
Jazyk:English
Vydavateľské údaje: United States IEEE 01.06.2024
Predmet:
ISSN:1063-6919, 1063-6919
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized do-mains like histopathology and satellite imagery; it is of-ten performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by as-sembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Gen-erating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. 1 1 Code is available at this link
AbstractList To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized do-mains like histopathology and satellite imagery; it is of-ten performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by as-sembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Gen-erating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. 1 1 Code is available at this link
To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.
To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions. 1
To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure the painstaking patch-level annotation effort required in specialized domains like histopathology and satellite imagery; it is often performed by domain experts and involves hundreds of millions of patches. Modern-day self-supervised learning (SSL) representations encode rich semantic and visual information. In this paper, we posit that such representations are expressive enough to act as proxies to fine-grained human labels. We introduce a novel approach that trains diffusion models conditioned on embeddings from SSL. Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. In addition, we construct larger images by assembling spatially consistent patches inferred from SSL embeddings, preserving long-range dependencies. Augmenting real data by generating variations of real images improves downstream classifier accuracy for patch-level and larger, image-scale classification tasks. Our models are effective even on datasets not encountered during training, demonstrating their robustness and generalizability. Generating images from learned embeddings is agnostic to the source of the embeddings. The SSL embeddings used to generate a large image can either be extracted from a reference image, or sampled from an auxiliary model conditioned on any related modality (e.g. class labels, text, genomic data). As proof of concept, we introduce the text-to-large image synthesis paradigm where we successfully synthesize large pathology and satellite images out of text descriptions.
Author Graikos, Alexandros
Saltz, Joel
Samaras, Dimitris
Yellapragada, Srikar
Kapse, Saarthak
Le, Minh-Quan
Prasanna, Prateek
Author_xml – sequence: 1
  givenname: Alexandros
  surname: Graikos
  fullname: Graikos, Alexandros
  email: agraikos@cs.stonybrook.edu
  organization: Stony Brook University
– sequence: 2
  givenname: Srikar
  surname: Yellapragada
  fullname: Yellapragada, Srikar
  organization: Stony Brook University
– sequence: 3
  givenname: Minh-Quan
  surname: Le
  fullname: Le, Minh-Quan
  organization: Stony Brook University
– sequence: 4
  givenname: Saarthak
  surname: Kapse
  fullname: Kapse, Saarthak
  organization: Stony Brook University
– sequence: 5
  givenname: Prateek
  surname: Prasanna
  fullname: Prasanna, Prateek
  organization: Stony Brook University
– sequence: 6
  givenname: Joel
  surname: Saltz
  fullname: Saltz, Joel
  organization: Stony Brook University
– sequence: 7
  givenname: Dimitris
  surname: Samaras
  fullname: Samaras, Dimitris
  organization: Stony Brook University
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39606708$$D View this record in MEDLINE/PubMed
BookMark eNpVUU1PwzAMDWiIjbF_MKEeuXTko0mbE0IDxqQh0DRxrdLEHUFdO5IWiX9PxMY0LrZlP7_njwvUq5saEBoTPCEEy5vp2-uS05SxCcU0mWCcEX6CRjKVGeOYcYaxOEUDggWLhSSydxT30cj7D4wxo4QImZ2jPpMCixRnAzRfgHI1mGgJWwce6la1tqnjWWdNyN7bsux8SETPjYHKR2XjooVya4jnG7WGaAY1uN-WS3RWqsrDaO-HaPX4sJo-xYuX2Xx6t4gtY0kbq9SEYTUpQBeZJrJQkKWGJlQkmlNQ2hDBCpVhoVWwSamFEZQbAalkCtgQ3e5ot12xAaPDxE5V-dbZjXLfeaNs_r9S2_d83XzlYXdMCCOB4XrP4JrPDnybb6zXUFWqhqbzeYCwlGWE8gC9OhY7qPzdLwDGO4AFgEM5nJ5zGl7zA-_Dgw4
CODEN IEEPAD
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
NPM
7X8
5PM
DOI 10.1109/CVPR52733.2024.00815
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList
PubMed

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 9798350353006
EISSN 1063-6919
EndPage 8542
ExternalDocumentID PMC11601131
39606708
10655298
Genre orig-research
Journal Article
GrantInformation_xml – fundername: NSF
  grantid: IIS-2123920,IIS-2212046
  funderid: 10.13039/100000001
– fundername: NCI
  grantid: 5U24CA215109,1R21CA258493-01A1,UR3CA225021
  funderid: 10.13039/100000054
– fundername: NCI NIH HHS
  grantid: R21 CA258493
– fundername: NCI NIH HHS
  grantid: U24 CA215109
– fundername: NCI NIH HHS
  grantid: UH3 CA225021
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
23M
29F
29O
6IK
ABDPE
ACGFS
IPLJI
M43
NPM
RIG
RNS
7X8
5PM
ID FETCH-LOGICAL-i334t-a7d300c1becb8c19bae87d24264c52eacd163ba806caa804fc6d625d6e793ae3
IEDL.DBID RIE
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001322555908090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-6919
IngestDate Tue Sep 30 17:06:48 EDT 2025
Thu Oct 02 11:55:37 EDT 2025
Wed Feb 19 02:04:06 EST 2025
Wed Aug 27 01:55:19 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i334t-a7d300c1becb8c19bae87d24264c52eacd163ba806caa804fc6d625d6e793ae3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Equal contribution.
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/11601131
PMID 39606708
PQID 3133738125
PQPubID 23479
PageCount 11
ParticipantIDs pubmed_primary_39606708
ieee_primary_10655298
pubmedcentral_primary_oai_pubmedcentral_nih_gov_11601131
proquest_miscellaneous_3133738125
PublicationCentury 2000
PublicationDate 20240601
PublicationDateYYYYMMDD 2024-06-01
PublicationDate_xml – month: 6
  year: 2024
  text: 20240601
  day: 1
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationTitleAlternate Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
ssj0023720
Score 2.5975652
Snippet To synthesize high-fidelity samples, diffusion models typically require auxiliary data to guide the generation process. However, it is impractical to procure...
SourceID pubmedcentral
proquest
pubmed
ieee
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 8532
SubjectTerms Adaptation models
Annotations
Computational modeling
diffusion models
generative models
Histopathology
Image synthesis
remote sensing
Training
Visualization
Title Learned Representation-Guided Diffusion Models for Large-Image Generation
URI https://ieeexplore.ieee.org/document/10655298
https://www.ncbi.nlm.nih.gov/pubmed/39606708
https://www.proquest.com/docview/3133738125
https://pubmed.ncbi.nlm.nih.gov/PMC11601131
Volume 2024
WOSCitedRecordID wos001322555908090&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7YxIHTeAwYj6lIXANt0zbJebwmTdM0TWi3Ka-KStChdeX346RdYQgO3Ko0kSLbcWzH_ozQNZEKtFxksBAswpGUAWaCBjgQTBIVCyGl4_SIjsdsPueTuljd1cIYY1zymbmxn-4tXy9VaUNlcMKTOA45a6EWpbQq1moCKgRcmYSzujwu8Pnt4HkytfhiBNzA0IJkM9v81jVR-c2e_JkW-e2eeej8c4f7qPtVsedNmrvoAO2Y_BB1ahPTqw9wcYSGDk8VhqYuA7YuPMrxY5lpGL3L0rS08TPP9kh7LTwwab2RTRbHwzfQPF6FUm2XdNHs4X42eMJ1NwWcERKtsaCa-L4KgGmSqYBLYRjVziJScQj6V4NpJgXzEwV886NUJRqcI50YOMLCkGPUzpe5OUUeT5jQKReUkigSmgoZWpAYbigRjKVJD3UtZRbvFV7GYkOUHrraEH0BQmxfJkRulmWxIOApU7AdwriHTiomNKuJ9bGoD6vZFnuaCRYge_tPnr04oGwQCFBfJDj7Y0PnaM9KRpX5dYHa61VpLtGu-lhnxaoPQjZnfSdkn4OE1BM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwED9BmbQ9dWwdK-wjk3g1JLETO8-DsYqsqqoK9S3yV7RIkKKl4e_n7KSBInjgLXJ8kuX78J199zuAt1RptHLMEikFI0ypiAjJIxJJoahOpFTKczrn87lYr7NFX6zua2GstT75zL5zn_4t32x0667KUMPTJIkz8RxeJIzFUVeuNVypUAxm0kz0BXJRmL2__bJYOoQxioFg7GCyhWt_69uo_M2j_DMx8reT5m78n2s8hsmvmr1gMZxGL-GZrU9g3DuZQa_CzSnMPKIqDi19DmxfelSTT21lcPRDVZatu0ELXJe0r02ATm2Qu3RxMvuGtifocKodyQRWdx9Xt_ek76dAKkrZlkhuaBjqCNmmhI4yJa3gxvtEOonRAht0zpQUYaqRcyErdWowPDKpRSWWlp7BqN7U9hyCLBXSlJnknDImDZcqdjAxmeVUClGmU5i4nSm-d4gZxW5TpnCz2_QCxdi9TcjabtqmoBgrc_Qe4mQKrzomDNTURVk8RGqxx55hgoPI3v9TV48eKhsFAg0YjV7_Y0HXcHi_esiLfDb__AaOnJR0eWAXMNo-tfYSDvSPbdU8XXlR-wmtMtZy
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Learned+Representation-Guided+Diffusion+Models+for+Large-Image+Generation&rft.au=Graikos%2C+Alexandros&rft.au=Yellapragada%2C+Srikar&rft.au=Le%2C+Minh-Quan&rft.au=Kapse%2C+Saarthak&rft.date=2024-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8532&rft.epage=8542&rft_id=info:doi/10.1109%2FCVPR52733.2024.00815&rft.externalDocID=10655298
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon