APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Tra...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ. |
|---|---|
| AbstractList | DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of highprecision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for 69% of power consumption. This study introduces a novel Additive Partial Sum Quantization (APSQ) method, seamlessly integrating PSUM accumulation into the quantization framework. A grouping strategy that combines APSQ with PSUM quantization enhanced by a reconfigurable architecture is further proposed. The APSQ performs nearly lossless on NLP and CV tasks across BERT, Segformer, and EfficientViT models while compressing PSUMs to INT8. This leads to a notable reduction in energy costs by \mathbf{2 8-8 7 \%}. Extended experiments on LLaMA2-7B demonstrate the potential of APSQ for large language models. Code is available at https://github.com/Yonghao-Tan/APSQ. |
| Author | Zhang, Dong Liu, Yu Huang, Xijie Luo, Peng Liang, Luhong Wu, Yongkun Dong, Pingcheng Liu, Xuejiao Liu, Shih-Yang Cheng, Kwang-Ting Tan, Yonghao |
| Author_xml | – sequence: 1 givenname: Yonghao surname: Tan fullname: Tan, Yonghao organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 2 givenname: Pingcheng surname: Dong fullname: Dong, Pingcheng organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 3 givenname: Yongkun surname: Wu fullname: Wu, Yongkun organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 4 givenname: Yu surname: Liu fullname: Liu, Yu organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 5 givenname: Xuejiao surname: Liu fullname: Liu, Xuejiao organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 6 givenname: Peng surname: Luo fullname: Luo, Peng organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 7 givenname: Shih-Yang surname: Liu fullname: Liu, Shih-Yang organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 8 givenname: Xijie surname: Huang fullname: Huang, Xijie organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 9 givenname: Dong surname: Zhang fullname: Zhang, Dong organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 10 givenname: Luhong surname: Liang fullname: Liang, Luhong organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) – sequence: 11 givenname: Kwang-Ting surname: Cheng fullname: Cheng, Kwang-Ting organization: The Hong Kong University of Science Technology,AI Chip Center for Emerging Smart Systems (ACCESS) |
| BookMark | eNo1j8FKAzEURSPoQmv_QCQ_MDUvL80k7oaptoWCLdV1SZo3NTCdkTS16NdbUFf3cBYH7g277PqOGLsHMQIQ9mFS1RqNsiMp5PisAFEYuGBDW1qDCGOBQplrNq-W69Ujr0KIOX4SX7qUo2v5-rjnq6Prcvx2OfYdP8X8zqt216cz7IuZS-HkEvG6LyZ0iLvull01rj3Q8G8H7O356bWeFYuX6byuFoWD0ubCAAKQMbZpvEYhwTTagxSkvLHSNKUOIWwNSY8OlNNkyXvlYauRlAWJA3b3241EtPlIce_S1-b_If4AaNtKCg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11133081 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11133081 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-81311e889ffb630218f6b120e4b8928f76dddc8e2b3a14a6e9ebb4b1c63e49123 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-81311e889ffb630218f6b120e4b8928f76dddc8e2b3a14a6e9ebb4b1c63e49123 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11133081 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.295121 |
| Snippet | DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Additives Costs dataflow DNN Energy consumption Hardware-aware quantization Large language models Memory architecture Model compression partial sum quantization Power demand Quantization (signal) Reconfigurable architectures Transformer Transformers |
| Title | APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design |
| URI | https://ieeexplore.ieee.org/document/11133081 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aPHhSseKbHLym3Uc2D29La1GQsqUqvZVkM1Gh7Urt6t83SbeKBw_ehpAQmEmYZOb7ZhC6ijTjOs0MUZxlhDImnOQecjoBwY1lMg11tp_u-XAoJhNZNGT1wIUBgAA-g44XQy7fVGXtQ2Vd3xY9jTzReptztiZrNazfOJLdft5zp4l6-kmSdTaTf7VNCV5jsPfP_fZR-4d_h4tvz3KAtmBxiO7yYjy6xrkxAe6DC290NcPjeo5HtVNQw6jEPrSK89lz5f79L3Pic_Ofagm4V5F-wGu00ePg5qF3S5pGCES5-7IiwtfEASGktZql3itbpuMkAqqFTITlzBhTCkh0qmKqGEjQmuq4ZClQ6XzTEWotqgUcI6wgUkq41YYLmnGuuAFqS2OlZVxm6Qlqez1M39a1LqYbFZz-MX6Gdr22PXgqSc5Ra7Ws4QLtlB-r1_flZbDQF6d1knU |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmuhJjRjf9uB1Ybfb7cPbBiQQkSwBDTfSbqdqAqxBVv--22XRePDgbdK0aTLTZtqZ75tB6MbXjOswMp7iLPIoY6KQioecJiC4sUyGZZ3tpz4fDMRkIpOKrF5yYQCgBJ9Bw4llLt9kae5CZU3XFj30HdF6O6KU-Gu6VsX7DXzZbMet4jxRR0AhUWMz_VfjlNJvdPb_ueMBqv8w8HDy7VsO0RYsjlAvTkbDWxwbUwJ-cOLMrmZ4lM_xMC9UVHEqsQuu4nj2nBU__5e557Lzn2oJuJV57RKxUUePnbtxq-tVrRA8VdyYlSdcVRwQQlqrWej8smU6ID5QLSQRljNjTCqA6FAFVDGQoDXVQcpCoLLwTseotsgWcIKwAl8pUaw2XNCIc8UNUJsaKy3jMgpPUd3pYfq2rnYx3ajg7I_xa7TbHT_0p_3e4P4c7TnNOygVIReotlrmcIl20o_V6_vyqrTWF3aNlbw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=APSQ%3A+Additive+Partial+Sum+Quantization+with+Algorithm-Hardware+Co-Design&rft.au=Tan%2C+Yonghao&rft.au=Dong%2C+Pingcheng&rft.au=Wu%2C+Yongkun&rft.au=Liu%2C+Yu&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133081&rft.externalDocID=11133081 |