SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations,...
Gespeichert in:
| Veröffentlicht in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
22.06.2025
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators. |
|---|---|
| AbstractList | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators. |
| Author | Venkatesan, Rangharajan Sylvester, Dennis Dai, Steve Khailany, Brucek Fan, Zichen |
| Author_xml | – sequence: 1 givenname: Zichen surname: Fan fullname: Fan, Zichen email: zcfan@umich.edu organization: University of Michigan,Ann Arbor,MI – sequence: 2 givenname: Steve surname: Dai fullname: Dai, Steve email: sdai@nvidia.com organization: NVIDIA,Santa Clara,CA – sequence: 3 givenname: Rangharajan surname: Venkatesan fullname: Venkatesan, Rangharajan organization: NVIDIA,Santa Clara,CA – sequence: 4 givenname: Dennis surname: Sylvester fullname: Sylvester, Dennis organization: University of Michigan,Ann Arbor,MI – sequence: 5 givenname: Brucek surname: Khailany fullname: Khailany, Brucek organization: NVIDIA,Santa Clara,CA |
| BookMark | eNo1z11LwzAYBeAIeqFz_0Akf6CzaZov70rrF2zI2MTL8TZ5UwNdOtpOmb_egnp14PBw4FyR89hFJOSWpQvGUnNXFaXkOjeLLM3EVDGeSZ6dkblRRnPORMrTXF-S9806qVb3tLAWW-xhDLGhVfD-OIQu0lXnsB3oVxg_aNE0PQ5D-ES6PkIcw_ekJwPR0S3uD10PLd0coB_CeLomFx7aAed_OSNvjw_b8jlZvj69lMUyAabMmDAOta2dBFOrHDhHZ6Sw0vFUKK9B5M5L0Badk04yhTIDb9DqXCjl2eRm5OZ3NyDi7tCHPfSn3f9f_gMDaVDX |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11132632 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11132632 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11132632 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.295223 |
| Snippet | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accelerator architectures Computational modeling Design automation Detectors Diffusion models Energy consumption Image synthesis Optimization Quantization (signal) Videos |
| Title | SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity |
| URI | https://ieeexplore.ieee.org/document/11132632 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6UePCkRozv9OC1wD7abr1tQOJFAgEjN9Jtp4SELARYf7_Tsmg8ePDWNE2bTNuZznS--Qh5snEhkiTlrKOsZqglLVPOAbMcRBKZTpHaLJBNyMEgm07VsAarBywMAITkM2j5ZvjLtytT-VBZO9CiiwQ17rGUcg_WqlG_UUe1e3kXT1Pq4Scxbx0G_6JNCVajf_bP9c5J8wd_R4ffluWCHEF5ST7GI9Z7e6a5MWgq_MaVc9pbOFf5gBf1pGbLLfVxVZrPgxeNioyOKhRdjbWkurR0sq9FtaTjtQ4JGU3y3n-ZdF9ZTYvANN4eTx6vC1NYoVUhU50kYJXgRli8udJlmqfWCZ0Z8FxR-D4BEWunwGQpl9JFOO6KNMpVCdeE4lvNReCL5GU4FU4ZWXQ_CoNuFY8d1zek6aUyW-8rX8wOArn9o_-OnHrZ-1SqOL4njd2mggdyYj53i-3mMezXF0J_mSM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmuhJjRjf9uB1ge223a23DUgwAoGAkRvp9kFIyEKA9fc7LYvGgwdvm6Zpk5l2pjM733wIPWmS8SiiLGgILQOwkjoQ1ppAM8OjUDUyqhNPNhH3-8lkIgYlWN1jYYwxvvjM1Nyn_5evl6pwqbK6p0XnEVjcQ0YpCXdwrRL3GzZEvZU24TxRB0AhrLaf_os4xfuN9uk_dzxD1R8EHh58-5ZzdGDyC_QxGgat3jNOlQJn4VSXz3Brbm3hUl7Y0ZotNthlVnE683E0mDI8LEB4JdoSy1zj8a4b1QKPVtKXZFTRe_tl3OwEJTFCIOH-OPp4malMcymymMooMlpwpriGuxvbRDKqLZeJMo4tCl4ohhNphVEJZXFsQ5h3iSr5MjdXCMNrzYbGtclLYClYMtQQgGQKAitGLJPXqOqkMl3tel9M9wK5-WP8ER13xr3utPvaf7tFJ04PrrCKkDtU2a4Lc4-O1Od2vlk_eN19AWglnGo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SQ-DM%3A+Accelerating+Diffusion+Models+with+Aggressive+Quantization+and+Temporal+Sparsity&rft.au=Fan%2C+Zichen&rft.au=Dai%2C+Steve&rft.au=Venkatesan%2C+Rangharajan&rft.au=Sylvester%2C+Dennis&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11132632&rft.externalDocID=11132632 |