MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compu...
Uložené v:
| Vydané v: | arXiv.org |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Paper |
| Jazyk: | English |
| Vydavateľské údaje: |
Ithaca
Cornell University Library, arXiv.org
25.05.2021
|
| Predmet: | |
| ISSN: | 2331-8422 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros, such as the number of word lines and bit lines that can be turned on at the same time, as well as how to map the weight to the SRAM CIM macro. In this study, a software and hardware co-design approach is proposed to design an SRAM CIM-based CNN accelerator and an SRAM CIM-aware model compression algorithm. To lessen the high-precision MAC required by batch normalization (BN), a quantization algorithm that can fuse BN into the weights is proposed. Furthermore, to reduce the number of network parameters, a sparsity algorithm that considers a CIM architecture is proposed. Last, MARS, a CIM-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparsity neural network, is proposed. |
|---|---|
| AbstractList | Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros, such as the number of word lines and bit lines that can be turned on at the same time, as well as how to map the weight to the SRAM CIM macro. In this study, a software and hardware co-design approach is proposed to design an SRAM CIM-based CNN accelerator and an SRAM CIM-aware model compression algorithm. To lessen the high-precision MAC required by batch normalization (BN), a quantization algorithm that can fuse BN into the weights is proposed. Furthermore, to reduce the number of network parameters, a sparsity algorithm that considers a CIM architecture is proposed. Last, MARS, a CIM-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparsity neural network, is proposed. |
| Author | Yi-Ren, Chen Meng-Fan, Chang Tang, Kea-Tiong Chih-Cheng Hsieh Lee, Jye-Luen Syuan-Hao Sie Chih-Cheng, Lu |
| Author_xml | – sequence: 1 fullname: Syuan-Hao Sie – sequence: 2 givenname: Jye-Luen surname: Lee fullname: Lee, Jye-Luen – sequence: 3 givenname: Chen surname: Yi-Ren fullname: Yi-Ren, Chen – sequence: 4 givenname: Lu surname: Chih-Cheng fullname: Chih-Cheng, Lu – sequence: 5 fullname: Chih-Cheng Hsieh – sequence: 6 givenname: Chang surname: Meng-Fan fullname: Meng-Fan, Chang – sequence: 7 givenname: Kea-Tiong surname: Tang fullname: Tang, Kea-Tiong |
| BookMark | eNotjctOwzAURC0EEqX0A9hFYp3itx12IeIlNSC17CvbvaYpaVzshPL5BMHqSDNHMxfotAsdIHRF8JxrIfCNid_N15ziMSBUS3KCJpQxkmtO6TmapbTDGFOpqBBsgqAul6vbrB7avsn3xsWQldFtmx5cP0TIVsuyzqrnOr8zCTZZ6Ry0EE0fYnZs-m1WhXwDqXnvxrIK-0OE9Ou9wBBNO6I_hviRLtGZN22C2T-naPVw_1Y95YvXx-eqXORGUJUXhlqvCAWutdBQGOWIld5ICpJ7qzjx1kvpmSukFtYqpr2nVhthGQXLpuj6b_UQw-cAqV_vwhC78XBNueCCFFgq9gOgZVj6 |
| ContentType | Paper |
| Copyright | 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.48550/arxiv.2010.12861 |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central (New) ProQuest Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: ProQuest Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-a527-9a2bf712e48858e9a7c1b6fa62e64fb741fbf66f3c9685bb738ff2b8a5b32eb3 |
| IEDL.DBID | M7S |
| IngestDate | Mon Jun 30 09:35:48 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a527-9a2bf712e48858e9a7c1b6fa62e64fb741fbf66f3c9685bb738ff2b8a5b32eb3 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| OpenAccessLink | https://www.proquest.com/docview/2454519067?pq-origsite=%requestingapplication% |
| PQID | 2454519067 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2454519067 |
| PublicationCentury | 2000 |
| PublicationDate | 20210525 |
| PublicationDateYYYYMMDD | 2021-05-25 |
| PublicationDate_xml | – month: 05 year: 2021 text: 20210525 day: 25 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2021 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 1.7589328 |
| SecondaryResourceType | preprint |
| Snippet | Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Accelerators Algorithms Artificial neural networks Co-design Computer architecture Hardware Machine learning Macros Matrix algebra Matrix methods Measurement Multiplication Neural networks Random access memory Sparsity Static random access memory |
| Title | MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks |
| URI | https://www.proquest.com/docview/2454519067 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwGA26KXjyN_6YIwevYW2apK0X6caGAzvKKjJPI0kTEHSb7Rz--SZZp4LgxVMouYQv5H3p-17eB8B1UJBAxpZ3ozFHRBHP1ncF8sNABYKJuHBizMf7cDSKJpM4qwm3qpZVbjDRAXUxl5Yj72BCrROKAdfbxRuyXaNsdbVuobENmtYlwXfSvfyLY8EsNDfmYF3MdNZdHV5-PK_Wii6DzMz_BcEurwz2_7uiA9DM-EKVh2BLzY7ArtNzyuoYqDQZ5zfQPa9Fr9ygLUx-lAxgPk5S2BumqGuyWAETKU36cRV3aJlZ2Jujwmk7zKSFDGcxXkBr5cFfzOC049UJyAf9h94dqjsqIE5xiGKOhQ59rMyppZGKeSh9wTRnWDGihblcaKEZ02b3WESFCINIaywiTkWAzV_3KWjM5jN1BqDinlcQE92ICYK55L7lTqx5FokFZfIctDYxm9aHopp-B-zi7-lLsIetdMSjCNMWaCzLd3UFduRq-VyVbdDs9kfZuO322nxlwzR7-gRn3bPB |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LSwMxEB60VfTkG9_moMdgN7vJ7goitVpabIu0RXqyJNksFLSt3fr6T_5IJ6lVQfDmwdMeAgvDTOZLvvkyA3DoJ4GvY8u78VjSwAQFW99V1At94yuh4sSJMW9qYaMRdTrx9Qy8Td_CWFnlNCe6RJ0MtOXIj1nAbScUTK5nwwdqp0bZ6up0hMYkLK7M6zNe2bLT6gX694ix8mW7VKEfUwWo5CyksWQqDT1mMHJ5ZGIZak-JVApmRJAqBNhUpUKkaIGIuFKhH6UpU5Hkymd488S_zkIeDxEsdkLB1iejw0SI53N_Ujp1jcKO5eil9zTRjyEOCO9HwncoVl76X_YvQ_5aDs1oBWZMfxXmnVZVZ2tg6sVm64S4p8P0XiKSkOK3cghpNYt1UqrW6TkidEKKWiO0OjUBsawzKQ1o4nQruGjToWufnhDbpkTe4cfp4rN1aP2BXRuQ6w_6ZhOIkYVCEqAvI6ECJrX0LC9kG4MFseJCb8Hu1EPdjw2fdb_cs_378gEsVNr1WrdWbVztwCKzEpkCp4zvQm48ejR7MKefxr1stO-ii8Dt3zrzHVeDDSo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MARS%3A+Multi-macro+Architecture+SRAM+CIM-Based+Accelerator+with+Co-designed+Compressed+Neural+Networks&rft.jtitle=arXiv.org&rft.au=Syuan-Hao+Sie&rft.au=Lee%2C+Jye-Luen&rft.au=Yi-Ren%2C+Chen&rft.au=Chih-Cheng%2C+Lu&rft.date=2021-05-25&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2010.12861 |