APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Tra...

Full description

Saved in:

Bibliographic Details
Published in:	2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors:	Tan, Yonghao, Dong, Pingcheng, Wu, Yongkun, Liu, Yu, Liu, Xuejiao, Luo, Peng, Liu, Shih-Yang, Huang, Xijie, Zhang, Dong, Liang, Luhong, Cheng, Kwang-Ting
Format:	Conference Proceeding
Language:	English
Published:	IEEE 22.06.2025
Subjects:	Additives Costs dataflow DNN Energy consumption Hardware-aware quantization Large language models Memory architecture Model compression partial sum quantization Power demand Quantization (signal) Reconfigurable architectures Transformer Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.
DOI:	10.1109/DAC63849.2025.11133081