APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Tra...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Tan, Yonghao, Dong, Pingcheng, Wu, Yongkun, Liu, Yu, Liu, Xuejiao, Luo, Peng, Liu, Shih-Yang, Huang, Xijie, Zhang, Dong, Liang, Luhong, Cheng, Kwang-Ting
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.
AbstractList DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ.
Author Zhang, Dong
Liu, Yu
Huang, Xijie
Luo, Peng
Liang, Luhong
Wu, Yongkun
Dong, Pingcheng
Liu, Xuejiao
Liu, Shih-Yang
Cheng, Kwang-Ting
Tan, Yonghao
Author_xml – sequence: 1
  givenname: Yonghao
  surname: Tan
  fullname: Tan, Yonghao
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 2
  givenname: Pingcheng
  surname: Dong
  fullname: Dong, Pingcheng
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 3
  givenname: Yongkun
  surname: Wu
  fullname: Wu, Yongkun
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 4
  givenname: Yu
  surname: Liu
  fullname: Liu, Yu
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 5
  givenname: Xuejiao
  surname: Liu
  fullname: Liu, Xuejiao
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 6
  givenname: Peng
  surname: Luo
  fullname: Luo, Peng
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 7
  givenname: Shih-Yang
  surname: Liu
  fullname: Liu, Shih-Yang
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 8
  givenname: Xijie
  surname: Huang
  fullname: Huang, Xijie
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 9
  givenname: Dong
  surname: Zhang
  fullname: Zhang, Dong
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 10
  givenname: Luhong
  surname: Liang
  fullname: Liang, Luhong
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
– sequence: 11
  givenname: Kwang-Ting
  surname: Cheng
  fullname: Cheng, Kwang-Ting
  organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS)
BookMark eNo1j8FKAzEURSPoQmv_QCQ_MDUvL80k7oaptoWCLdV1SZo3NTCdkTS16NdbUFf3cBYH7g277PqOGLsHMQIQ9mFS1RqNsiMp5PisAFEYuGBDW1qDCGOBQplrNq-W69Ujr0KIOX4SX7qUo2v5-rjnq6Prcvx2OfYdP8X8zqt216cz7IuZS-HkEvG6LyZ0iLvull01rj3Q8G8H7O356bWeFYuX6byuFoWD0ubCAAKQMbZpvEYhwTTagxSkvLHSNKUOIWwNSY8OlNNkyXvlYauRlAWJA3b3241EtPlIce_S1-b_If4AaNtKCg
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133081
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133081
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-81311e889ffb630218f6b120e4b8928f76dddc8e2b3a14a6e9ebb4b1c63e49123
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-81311e889ffb630218f6b120e4b8928f76dddc8e2b3a14a6e9ebb4b1c63e49123
PageCount 7
ParticipantIDs ieee_primary_11133081
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.295121
Snippet DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Additives
Costs
dataflow
DNN
Energy consumption
Hardware-aware quantization
Large language models
Memory architecture
Model compression
partial sum quantization
Power demand
Quantization (signal)
Reconfigurable architectures
Transformer
Transformers
Title APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
URI https://ieeexplore.ieee.org/document/11133081
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLym3Uc2D29La1GQsqUqvZVkM1Gh7Urt6t83SbeKBw_ehpAQmEmYZOb7ZhC6ijTjOs0MUZxlhDImnOQecjoBwY1lMg11tp_u-XAoJhNZNGT1wIUBgAA-g44XQy7fVGXtQ2Vd3xY9jTzReptztiZrNazfOJLdft5zp4l6-kmSdTaTf7VNCV5jsPfP_fZR-4d_h4tvz3KAtmBxiO7yYjy6xrkxAe6DC290NcPjeo5HtVNQw6jEPrSK89lz5f79L3Pic_Ofagm4V5F-wGu00ePg5qF3S5pGCES5-7IiwtfEASGktZql3itbpuMkAqqFTITlzBhTCkh0qmKqGEjQmuq4ZClQ6XzTEWotqgUcI6wgUkq41YYLmnGuuAFqS2OlZVxm6Qlqez1M39a1LqYbFZz-MX6Gdr22PXgqSc5Ra7Ws4QLtlB-r1_flZbDQF6d1knU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmuhJjRjf9uB1Ybfb7cPbBiQQkSwBDTfSbqdqAqxBVv--22XRePDgbdK0aTLTZtqZ75tB6MbXjOswMp7iLPIoY6KQioecJiC4sUyGZZ3tpz4fDMRkIpOKrF5yYQCgBJ9Bw4llLt9kae5CZU3XFj30HdF6O6KU-Gu6VsX7DXzZbMet4jxRR0AhUWMz_VfjlNJvdPb_ueMBqv8w8HDy7VsO0RYsjlAvTkbDWxwbUwJ-cOLMrmZ4lM_xMC9UVHEqsQuu4nj2nBU__5e557Lzn2oJuJV57RKxUUePnbtxq-tVrRA8VdyYlSdcVRwQQlqrWej8smU6ID5QLSQRljNjTCqA6FAFVDGQoDXVQcpCoLLwTseotsgWcIKwAl8pUaw2XNCIc8UNUJsaKy3jMgpPUd3pYfq2rnYx3ajg7I_xa7TbHT_0p_3e4P4c7TnNOygVIReotlrmcIl20o_V6_vyqrTWF3aNlbw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=APSQ%3A+Additive+Partial+Sum+Quantization+with+Algorithm-Hardware+Co-Design&rft.au=Tan%2C+Yonghao&rft.au=Dong%2C+Pingcheng&rft.au=Wu%2C+Yongkun&rft.au=Liu%2C+Yu&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133081&rft.externalDocID=11133081