SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations,...
Saved in:
| Published in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
22.06.2025
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators. |
|---|---|
| AbstractList | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators. |
| Author | Venkatesan, Rangharajan Sylvester, Dennis Dai, Steve Khailany, Brucek Fan, Zichen |
| Author_xml | – sequence: 1 givenname: Zichen surname: Fan fullname: Fan, Zichen email: zcfan@umich.edu organization: University of Michigan,Ann Arbor,MI – sequence: 2 givenname: Steve surname: Dai fullname: Dai, Steve email: sdai@nvidia.com organization: NVIDIA,Santa Clara,CA – sequence: 3 givenname: Rangharajan surname: Venkatesan fullname: Venkatesan, Rangharajan organization: NVIDIA,Santa Clara,CA – sequence: 4 givenname: Dennis surname: Sylvester fullname: Sylvester, Dennis organization: University of Michigan,Ann Arbor,MI – sequence: 5 givenname: Brucek surname: Khailany fullname: Khailany, Brucek organization: NVIDIA,Santa Clara,CA |
| BookMark | eNo1z11LwzAYBeAIeqFz_0Akf6CzaZov70rrF2zI2MTL8TZ5UwNdOtpOmb_egnp14PBw4FyR89hFJOSWpQvGUnNXFaXkOjeLLM3EVDGeSZ6dkblRRnPORMrTXF-S9806qVb3tLAWW-xhDLGhVfD-OIQu0lXnsB3oVxg_aNE0PQ5D-ES6PkIcw_ekJwPR0S3uD10PLd0coB_CeLomFx7aAed_OSNvjw_b8jlZvj69lMUyAabMmDAOta2dBFOrHDhHZ6Sw0vFUKK9B5M5L0Badk04yhTIDb9DqXCjl2eRm5OZ3NyDi7tCHPfSn3f9f_gMDaVDX |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11132632 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11132632 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11132632 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.295106 |
| Snippet | Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accelerator architectures Computational modeling Design automation Detectors Diffusion models Energy consumption Image synthesis Optimization Quantization (signal) Videos |
| Title | SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity |
| URI | https://ieeexplore.ieee.org/document/11132632 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6UePCkRozv9OC1sI--1hsBiRcJBIzcSB9TQkIWAqy_37YsGg8evDVNkybTzkxnOt98CD3x1AjGc0OoLDShPnYj2klGnEwgSS1T2iWRbEIMBnI6LYY1WD1iYQAgFp9BKwzjX75dmSqkytqRFp3n3uIeC8H3YK0a9ZsmRbvX6frbRAP8JGOtw-JftCnRa_TP_rnfOWr-4O_w8NuzXKAjKC_Rx3hEem_PuGOMdxXh4Mo57i2cq0LCCwdSs-UWh7wq7sxjFO0NGR5VXnQ11hKr0uLJvhfVEo_XKhZkNNF7_2XSfSU1LQJRXnsCebzSRluuCi2oynOwBWeGW6-5wknFqHVcSQOBK8q_T4BnyhVgJGVCuNSvu0KNclXCNcLKWKoSlxtQjqahV570Oq40t4kFH0rdoGaQymy973wxOwjk9o_5O3QaZB9KqbLsHjV2mwoe0In53C22m8d4Xl-PzpnO |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmuhJjRjf9uC10N1tu11vBCQYgUDAyI30SUjIQoD199uWRePBg7emadJkptPpTOebD4AnFqmUskQhwjOJiIvdkLScIsuxwZGmQlocyCbSfp9PJtmgBKsHLIwxJhSfmZofhr98vVSFT5XVAy06S9yNe0gJifEOrlXifiOc1VuNpjtPxANQYlrbL_9FnBL8Rvv0nzuegeoPAg8Ovn3LOTgw-QX4GA1Rq_cMG0o5Z-FVl89ga25t4VNe0NOaLTbQZ1ZhYxbiaHeVwWHhhFeiLaHINRzvulEt4GglQklGFby3X8bNDiqJEZBw9uPp44VUUjORyZSIJDE6Y1Qx7Ww3tVxQoi0TXBnPFuVeKIbFwmZGcULT1EZu3SWo5MvcXAEolCYC20QZYUnku-VxZ-VCMo21ccHUNah6qUxXu94X071Abv6YfwTHnXGvO-2-9t9uwYnXgy-siuM7UNmuC3MPjtTndr5ZPwTdfQG9fp0V |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SQ-DM%3A+Accelerating+Diffusion+Models+with+Aggressive+Quantization+and+Temporal+Sparsity&rft.au=Fan%2C+Zichen&rft.au=Dai%2C+Steve&rft.au=Venkatesan%2C+Rangharajan&rft.au=Sylvester%2C+Dennis&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11132632&rft.externalDocID=11132632 |