Optimizing Diffusion Model Training Efficiency to Generate High-Resolution Images
In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of...
Gespeichert in:
| Veröffentlicht in: | 2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE) S. 1 - 6 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
06.06.2025
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of latent space compression and multi-stage diffusion generation, and constructs a fusion architecture of conditional input and latent representation. This paper uses Vector Quantized-Variational AutoEncoder (VQ-VAE) to compress the latent space of high-resolution images, maps the images to low-dimensional latent space, designs a multi-stage diffusion generation process, and subdivides the diffusion process into multiple stages. In order to achieve the fusion of conditional input and potential representation, a cross-modal cross-attention mechanism is introduced to allow the model to receive additional conditional input during the generation process. In addition, this paper also integrates time step clustering with multi-decoder architecture and adaptive time step reduction strategy to improve training efficiency while maintaining generation quality. The results show that after optimization, the training time of the model is reduced (from a maximum of 543 milliseconds to 291 milliseconds) and the memory occupancy rate is reduced (from an average of 42.06% to 22.12%). At the same time, the PSNR (Peak Signal-to-Noise Ratio) value of the generated image is improved, the FID (Fréchet Inception Distance) value is reduced, and the generation quality is significantly improved. The optimization strategy proposed in this paper has achieved results in improving the training efficiency and generation quality of the diffusion model. |
|---|---|
| AbstractList | In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to optimize the diffusion model training efficiency to generate high-quality high-resolution images. This method integrates the two-stage process of latent space compression and multi-stage diffusion generation, and constructs a fusion architecture of conditional input and latent representation. This paper uses Vector Quantized-Variational AutoEncoder (VQ-VAE) to compress the latent space of high-resolution images, maps the images to low-dimensional latent space, designs a multi-stage diffusion generation process, and subdivides the diffusion process into multiple stages. In order to achieve the fusion of conditional input and potential representation, a cross-modal cross-attention mechanism is introduced to allow the model to receive additional conditional input during the generation process. In addition, this paper also integrates time step clustering with multi-decoder architecture and adaptive time step reduction strategy to improve training efficiency while maintaining generation quality. The results show that after optimization, the training time of the model is reduced (from a maximum of 543 milliseconds to 291 milliseconds) and the memory occupancy rate is reduced (from an average of 42.06% to 22.12%). At the same time, the PSNR (Peak Signal-to-Noise Ratio) value of the generated image is improved, the FID (Fréchet Inception Distance) value is reduced, and the generation quality is significantly improved. The optimization strategy proposed in this paper has achieved results in improving the training efficiency and generation quality of the diffusion model. |
| Author | Wang, Junhua Jiang, Yuan |
| Author_xml | – sequence: 1 givenname: Junhua surname: Wang fullname: Wang, Junhua email: 201054@gwng.edu.cn organization: University of Foreign Studies,School of Computer Science South China Business College Guangdong,Gangzhou,China – sequence: 2 givenname: Yuan surname: Jiang fullname: Jiang, Yuan email: 206101@gwng.edu.cn organization: University of Foreign Studies,School of Marxism South China Business College Guangdong,Gangzhou,China |
| BookMark | eNo1j8FOwzAQRI0EByj9Aw7mA1Jsb20nRxRKG1FUgcq5cpx1WClxqiQ9lK-nVeE00ozeaOaOXccuImOPUsykFNlTkRf528JokHamhNInV4JRYK_YNLNZCiA1mFTaW_ax2Y_U0g_Fmr9QCIeBusjfuwobvu0dxXOwCIE8YfRHPnZ8iRF7NyJfUf2dfOLQNYfxTBWtq3G4ZzfBNQNO_3TCvl4X23yVrDfLIn9eJyRtOiZaCrCQpb401srTyCpYlBZ1VqIIcxOCMpUyCi2kTiCAdljOK19q7yqjPUzYw6WXEHG376l1_XH3_xR-AdAUTvk |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICICKE65317.2025.11136237 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331536817 |
| EndPage | 6 |
| ExternalDocumentID | 11136237 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i178t-51037398cb6771025df7e17e59be0f46ff26d262e738a0e335aeb4dcb5cad65c3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001575314000028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Oct 01 07:05:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i178t-51037398cb6771025df7e17e59be0f46ff26d262e738a0e335aeb4dcb5cad65c3 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_11136237 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-6 |
| PublicationDateYYYYMMDD | 2025-06-06 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-6 day: 06 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE) |
| PublicationTitleAbbrev | ICICKE |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.9109231 |
| Snippet | In order to solve the bottleneck problem of diffusion model training efficiency in high-resolution image generation tasks, this paper proposes a method to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Adaptation models diffusion model Diffusion models FID Image coding Image synthesis Memory management Optimization PSNR Training training efficiency training time Vectors video memory occupancy Videos |
| Title | Optimizing Diffusion Model Training Efficiency to Generate High-Resolution Images |
| URI | https://ieeexplore.ieee.org/document/11136237 |
| WOSCitedRecordID | wos001575314000028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sEfGkYsVvInhNu91sku25tliUWqFKbyWbTGDBdqXdevDXm2S3igcP3kIIBCYfM5nMew_g1thEuKghpZwZThMtNM0kQ-q56FgirFIB4f36KMfjdDbrTWqwesDCIGIoPsO2b4a_fFPojU-VdbwsunPXsgENKUUF1tqDm5o3szPqj_oPA-F2lXQPv5i3t-N_KacExzE8-OeUh9D6geCRybdzOYIdXB7D85M74Iv80_WQu9zajU91ES9n9kamtdYDGQRSCI-oJGVBKlrpEokv6KA-WV9tNTJauJtk3YKX4WDav6e1JgLNuzItqSfAk6yX6kxIHxxwYyV2JfJehpGzu7WxMLGIUbJURcgYV5glRmdcKyO4ZifQXBZLPAViImPdaTbMxSBJqmIV8TgRbukilApVcgYtb4_5e0V7Md-a4vyP_gvY91YPdVTiEprlaoNXsKs_yny9ug6L9QWrRZaj |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UjXpSI8Zva-K1sGy_ljNC2ICICRpupNtOk00EDCwe_PW2ZdF48OCtadJsMtPuTKfz3kPo3lgmXNaQEE4NJ0wLTTJJgXguOsqEVSogvF_7cjBIxuPmsASrBywMAITmM6j5YXjLN3O98qWyupdFd-FabqMdzlgcreFae-iuZM6sp6201WsLt6-ku_rFvLZZ8Us7JYSOzuE_P3qEqj8gPDz8Di_HaAtmJ-j5yR3xaf7pZvBDbu3KF7uwFzR7w6NS7QG3Ay2Ex1TiYo7XxNIFYN_SQXy5fr3ZcDp1_5JlFb102qNWl5SqCCRvyKQgngJP0maiMyF9esCNldCQwJsZRM7y1sbCxCIGSRMVAaVcQcaMzrhWRnBNT1FlNp_BGcImMtadZ0NdFsISFauIx0w450UgFSh2jqreHpP3NfHFZGOKiz_mb9F-d_TYn_TTQe8SHXgPhK4qcYUqxWIF12hXfxT5cnETHPcFsKqZ6g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+International+Conference+on+Intelligent+Computing+and+Knowledge+Extraction+%28ICICKE%29&rft.atitle=Optimizing+Diffusion+Model+Training+Efficiency+to+Generate+High-Resolution+Images&rft.au=Wang%2C+Junhua&rft.au=Jiang%2C+Yuan&rft.date=2025-06-06&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICICKE65317.2025.11136237&rft.externalDocID=11136237 |