Optimizing Diffusion Model Training Efficiency to Generate High-Resolution Images

In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE) S. 1 - 6
Hauptverfasser: Wang, Junhua, Jiang, Yuan
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 06.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of latent space compression and multi-stage diffusion generation, and constructs a fusion architecture of conditional input and latent representation. This paper uses Vector Quantized-Variational AutoEncoder (VQ-VAE) to compress the latent space of high-resolution images, maps the images to low-dimensional latent space, designs a multi-stage diffusion generation process, and subdivides the diffusion process into multiple stages. In order to achieve the fusion of conditional input and potential representation, a cross-modal cross-attention mechanism is introduced to allow the model to receive additional conditional input during the generation process. In addition, this paper also integrates time step clustering with multi-decoder architecture and adaptive time step reduction strategy to improve training efficiency while maintaining generation quality. The results show that after optimization, the training time of the model is reduced (from a maximum of 543 milliseconds to 291 milliseconds) and the memory occupancy rate is reduced (from an average of 42.06% to 22.12%). At the same time, the PSNR (Peak Signal-to-Noise Ratio) value of the generated image is improved, the FID (Fréchet Inception Distance) value is reduced, and the generation quality is significantly improved. The optimization strategy proposed in this paper has achieved results in improving the training efficiency and generation quality of the diffusion model.
AbstractList In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of latent space compression and multi-stage diffusion generation, and constructs a fusion architecture of conditional input and latent representation. This paper uses Vector Quantized-Variational AutoEncoder (VQ-VAE) to compress the latent space of high-resolution images, maps the images to low-dimensional latent space, designs a multi-stage diffusion generation process, and subdivides the diffusion process into multiple stages. In order to achieve the fusion of conditional input and potential representation, a cross-modal cross-attention mechanism is introduced to allow the model to receive additional conditional input during the generation process. In addition, this paper also integrates time step clustering with multi-decoder architecture and adaptive time step reduction strategy to improve training efficiency while maintaining generation quality. The results show that after optimization, the training time of the model is reduced (from a maximum of 543 milliseconds to 291 milliseconds) and the memory occupancy rate is reduced (from an average of 42.06% to 22.12%). At the same time, the PSNR (Peak Signal-to-Noise Ratio) value of the generated image is improved, the FID (Fréchet Inception Distance) value is reduced, and the generation quality is significantly improved. The optimization strategy proposed in this paper has achieved results in improving the training efficiency and generation quality of the diffusion model.
Author Wang, Junhua
Jiang, Yuan
Author_xml – sequence: 1
  givenname: Junhua
  surname: Wang
  fullname: Wang, Junhua
  email: 201054@gwng.edu.cn
  organization: University of Foreign Studies,School of Computer Science South China Business College Guangdong,Gangzhou,China
– sequence: 2
  givenname: Yuan
  surname: Jiang
  fullname: Jiang, Yuan
  email: 206101@gwng.edu.cn
  organization: University of Foreign Studies,School of Marxism South China Business College Guangdong,Gangzhou,China
BookMark eNo1j8FOwzAQRI0EByj9Aw7mA1Jsb20nRxRKG1FUgcq5cpx1WClxqiQ9lK-nVeE00ozeaOaOXccuImOPUsykFNlTkRf528JokHamhNInV4JRYK_YNLNZCiA1mFTaW_ax2Y_U0g_Fmr9QCIeBusjfuwobvu0dxXOwCIE8YfRHPnZ8iRF7NyJfUf2dfOLQNYfxTBWtq3G4ZzfBNQNO_3TCvl4X23yVrDfLIn9eJyRtOiZaCrCQpb401srTyCpYlBZ1VqIIcxOCMpUyCi2kTiCAdljOK19q7yqjPUzYw6WXEHG376l1_XH3_xR-AdAUTvk
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICICKE65317.2025.11136237
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331536817
EndPage 6
ExternalDocumentID 11136237
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i178t-51037398cb6771025df7e17e59be0f46ff26d262e738a0e335aeb4dcb5cad65c3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001575314000028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Oct 01 07:05:13 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i178t-51037398cb6771025df7e17e59be0f46ff26d262e738a0e335aeb4dcb5cad65c3
PageCount 6
ParticipantIDs ieee_primary_11136237
PublicationCentury 2000
PublicationDate 2025-June-6
PublicationDateYYYYMMDD 2025-06-06
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-6
  day: 06
PublicationDecade 2020
PublicationTitle 2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE)
PublicationTitleAbbrev ICICKE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.9109231
Snippet In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Adaptation models
diffusion model
Diffusion models
FID
Image coding
Image synthesis
Memory management
Optimization
PSNR
Training
training efficiency
training time
Vectors
video memory occupancy
Videos
Title Optimizing Diffusion Model Training Efficiency to Generate High-Resolution Images
URI https://ieeexplore.ieee.org/document/11136237
WOSCitedRecordID wos001575314000028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sEfGkYsVvInhNu91sku25tliUWqFKbyWbTGDBdqXdevDXm2S3igcP3kIIBCYfM5nMew_g1thEuKghpZwZThMtNM0kQ-q56FgirFIB4f36KMfjdDbrTWqwesDCIGIoPsO2b4a_fFPojU-VdbwsunPXsgENKUUF1tqDm5o3szPqj_oPA-F2lXQPv5i3t-N_KacExzE8-OeUh9D6geCRybdzOYIdXB7D85M74Iv80_WQu9zajU91ES9n9kamtdYDGQRSCI-oJGVBKlrpEokv6KA-WV9tNTJauJtk3YKX4WDav6e1JgLNuzItqSfAk6yX6kxIHxxwYyV2JfJehpGzu7WxMLGIUbJURcgYV5glRmdcKyO4ZifQXBZLPAViImPdaTbMxSBJqmIV8TgRbukilApVcgYtb4_5e0V7Md-a4vyP_gvY91YPdVTiEprlaoNXsKs_yny9ug6L9QWrRZaj
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UjXpSI8Zva-K1sGy_ljNC2ICICRpupNtOk00EDCwe_PW2ZdF48OCtadJsMtPuTKfz3kPo3lgmXNaQEE4NJ0wLTTJJgXguOsqEVSogvF_7cjBIxuPmsASrBywMAITmM6j5YXjLN3O98qWyupdFd-FabqMdzlgcreFae-iuZM6sp6201WsLt6-ku_rFvLZZ8Us7JYSOzuE_P3qEqj8gPDz8Di_HaAtmJ-j5yR3xaf7pZvBDbu3KF7uwFzR7w6NS7QG3Ay2Ex1TiYo7XxNIFYN_SQXy5fr3ZcDp1_5JlFb102qNWl5SqCCRvyKQgngJP0maiMyF9esCNldCQwJsZRM7y1sbCxCIGSRMVAaVcQcaMzrhWRnBNT1FlNp_BGcImMtadZ0NdFsISFauIx0w450UgFSh2jqreHpP3NfHFZGOKiz_mb9F-d_TYn_TTQe8SHXgPhK4qcYUqxWIF12hXfxT5cnETHPcFsKqZ6g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+International+Conference+on+Intelligent+Computing+and+Knowledge+Extraction+%28ICICKE%29&rft.atitle=Optimizing+Diffusion+Model+Training+Efficiency+to+Generate+High-Resolution+Images&rft.au=Wang%2C+Junhua&rft.au=Jiang%2C+Yuan&rft.date=2025-06-06&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICICKE65317.2025.11136237&rft.externalDocID=11136237