SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity

Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations,...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autori: Fan, Zichen, Dai, Steve, Venkatesan, Rangharajan, Sylvester, Dennis, Khailany, Brucek
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 22.06.2025
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Diffusion models have gained significant popularity in image generation tasks. However, generating high-quality content remains notably slow because it requires running model inference over many time steps. To accelerate these models, we propose to aggressively quantize both weights and activations, while simultaneously promoting significant activation sparsity. We further observe that the stated sparsity pattern varies among different channels and evolves across time steps. To support this quantization and sparsity scheme, we present a novel diffusion model accelerator featuring a heterogeneous mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector for efficient handling of the sparsity pattern. Our 4-bit quantization technique demonstrates superior generation quality compared to existing \mathbf{4}-bit methods. Our custom accelerator achieves 6.91 \times speed-up and 51.5% energy reduction compared to traditional dense accelerators.
DOI:10.1109/DAC63849.2025.11132632