APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Tra...
Saved in:
| Published in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7 |
|---|---|
| Main Authors: | , , , , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
22.06.2025
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11133081 |