SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity

Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7
Hauptverfasser: Fan, Zichen, Dai, Steve, Venkatesan, Rangharajan, Sylvester, Dennis, Khailany, Brucek
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 22.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators.
AbstractList Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators.
Author Venkatesan, Rangharajan
Sylvester, Dennis
Dai, Steve
Khailany, Brucek
Fan, Zichen
Author_xml – sequence: 1
  givenname: Zichen
  surname: Fan
  fullname: Fan, Zichen
  email: zcfan@umich.edu
  organization: University of Michigan,Ann Arbor,MI
– sequence: 2
  givenname: Steve
  surname: Dai
  fullname: Dai, Steve
  email: sdai@nvidia.com
  organization: NVIDIA,Santa Clara,CA
– sequence: 3
  givenname: Rangharajan
  surname: Venkatesan
  fullname: Venkatesan, Rangharajan
  organization: NVIDIA,Santa Clara,CA
– sequence: 4
  givenname: Dennis
  surname: Sylvester
  fullname: Sylvester, Dennis
  organization: University of Michigan,Ann Arbor,MI
– sequence: 5
  givenname: Brucek
  surname: Khailany
  fullname: Khailany, Brucek
  organization: NVIDIA,Santa Clara,CA
BookMark eNo1z11LwzAYBeAIeqFz_0Akf6CzaZov70rrF2zI2MTL8TZ5UwNdOtpOmb_egnp14PBw4FyR89hFJOSWpQvGUnNXFaXkOjeLLM3EVDGeSZ6dkblRRnPORMrTXF-S9806qVb3tLAWW-xhDLGhVfD-OIQu0lXnsB3oVxg_aNE0PQ5D-ES6PkIcw_ekJwPR0S3uD10PLd0coB_CeLomFx7aAed_OSNvjw_b8jlZvj69lMUyAabMmDAOta2dBFOrHDhHZ6Sw0vFUKK9B5M5L0Badk04yhTIDb9DqXCjl2eRm5OZ3NyDi7tCHPfSn3f9f_gMDaVDX
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11132632
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11132632
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-13abcbd6a9b74a33ed965c6d3057f8a54df6a8cedd6d617e62af9ec84577f16d3
PageCount 7
ParticipantIDs ieee_primary_11132632
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.295223
Snippet Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accelerator architectures
Computational modeling
Design automation
Detectors
Diffusion models
Energy consumption
Image synthesis
Optimization
Quantization (signal)
Videos
Title SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
URI https://ieeexplore.ieee.org/document/11132632
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6UePCkRozv9OC1wD7abr1tQOJFAgEjN9Jtp4SELARYf7_Tsmg8ePDWNE2bTNuZznS--Qh5snEhkiTlrKOsZqglLVPOAbMcRBKZTpHaLJBNyMEgm07VsAarBywMAITkM2j5ZvjLtytT-VBZO9CiiwQ17rGUcg_WqlG_UUe1e3kXT1Pq4Scxbx0G_6JNCVajf_bP9c5J8wd_R4ffluWCHEF5ST7GI9Z7e6a5MWgq_MaVc9pbOFf5gBf1pGbLLfVxVZrPgxeNioyOKhRdjbWkurR0sq9FtaTjtQ4JGU3y3n-ZdF9ZTYvANN4eTx6vC1NYoVUhU50kYJXgRli8udJlmqfWCZ0Z8FxR-D4BEWunwGQpl9JFOO6KNMpVCdeE4lvNReCL5GU4FU4ZWXQ_CoNuFY8d1zek6aUyW-8rX8wOArn9o_-OnHrZ-1SqOL4njd2mggdyYj53i-3mMezXF0J_mSM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmuhJjRjf9uB1ge223a23DUgwAoGAkRvp9kFIyEKA9fc7LYvGgwdvm6Zpk5l2pjM733wIPWmS8SiiLGgILQOwkjoQ1ppAM8OjUDUyqhNPNhH3-8lkIgYlWN1jYYwxvvjM1Nyn_5evl6pwqbK6p0XnEVjcQ0YpCXdwrRL3GzZEvZU24TxRB0AhrLaf_os4xfuN9uk_dzxD1R8EHh58-5ZzdGDyC_QxGgat3jNOlQJn4VSXz3Brbm3hUl7Y0ZotNthlVnE683E0mDI8LEB4JdoSy1zj8a4b1QKPVtKXZFTRe_tl3OwEJTFCIOH-OPp4malMcymymMooMlpwpriGuxvbRDKqLZeJMo4tCl4ohhNphVEJZXFsQ5h3iSr5MjdXCMNrzYbGtclLYClYMtQQgGQKAitGLJPXqOqkMl3tel9M9wK5-WP8ER13xr3utPvaf7tFJ04PrrCKkDtU2a4Lc4-O1Od2vlk_eN19AWglnGo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=SQ-DM%3A+Accelerating+Diffusion+Models+with+Aggressive+Quantization+and+Temporal+Sparsity&rft.au=Fan%2C+Zichen&rft.au=Dai%2C+Steve&rft.au=Venkatesan%2C+Rangharajan&rft.au=Sylvester%2C+Dennis&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11132632&rft.externalDocID=11132632