GAAS: An Efficient Group Associated Architecture and Scheduler Module for Sparse CNN Accelerators
Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with...
Uloženo v:
| Vydáno v: | IEEE transactions on computer-aided design of integrated circuits and systems Ročník 39; číslo 12; s. 5170 - 5182 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.12.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0278-0070, 1937-4151 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an n-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a 16×16 PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by 1.53×, 1.62×, 1.46×, and 1.55×, respectively. |
|---|---|
| AbstractList | Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an [Formula Omitted]-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a [Formula Omitted] PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by [Formula Omitted], [Formula Omitted], [Formula Omitted], and [Formula Omitted], respectively. Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an n-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a 16×16 PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by 1.53×, 1.62×, 1.46×, and 1.55×, respectively. |
| Author | Wang, Jingyu Liu, Yongpan Feng, Xiaoyu Yang, Huazhong Yuan, Zhe Du, Li Liu, Ruoyang |
| Author_xml | – sequence: 1 givenname: Jingyu orcidid: 0000-0002-7160-4165 surname: Wang fullname: Wang, Jingyu organization: Department of Electronic Engineering, Tsinghua University, Beijing, China – sequence: 2 givenname: Zhe surname: Yuan fullname: Yuan, Zhe organization: Department of Electronic Engineering, Tsinghua University, Beijing, China – sequence: 3 givenname: Ruoyang orcidid: 0000-0001-9873-6574 surname: Liu fullname: Liu, Ruoyang organization: Department of Electronic Engineering, Tsinghua University, Beijing, China – sequence: 4 givenname: Xiaoyu surname: Feng fullname: Feng, Xiaoyu organization: Department of Electronic Engineering, Tsinghua University, Beijing, China – sequence: 5 givenname: Li orcidid: 0000-0001-6346-6615 surname: Du fullname: Du, Li email: duli@bupt.edu.cn organization: School of Information and Communication Engineering and Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China – sequence: 6 givenname: Huazhong orcidid: 0000-0003-2421-353X surname: Yang fullname: Yang, Huazhong organization: Department of Electronic Engineering, Tsinghua University, Beijing, China – sequence: 7 givenname: Yongpan orcidid: 0000-0002-4892-2309 surname: Liu fullname: Liu, Yongpan email: ypliu@tsinghua.edu.cn organization: Department of Electronic Engineering, Tsinghua University, Beijing, China |
| BookMark | eNp9kDtPwzAUhS0EEuXxAxCLJeYUX8eOY7aoQEHiMbTMkWvfqEYlLrYz8O9JVcTAwHSGe757pO-EHPahR0IugE0BmL5ezprbKWecTbmuKiHhgExAl6oQIOGQTBhXdcGYYsfkJKV3xkBIrifEzJtmcUObnt51nbce-0znMQxb2qQUrDcZHW2iXfuMNg8RqekdXdg1umGDkT6HXdIuRLrYmpiQzl5eaGMtjleTQ0xn5Kgzm4TnP3lK3u7vlrOH4ul1_jhrngrLdZkLMCWWbiUBVlYidivhKtAoqsoywaCGDlgpUTilmDLSaQ4WtBMrZrnTQpWn5Gr_dxvD54Apt-9hiP042XJR8VrW487Ygn3LxpBSxK7dRv9h4lcLrN2ZbHcm253J9sfkyKg_jPXZZB_6HI3f_Ete7kmPiL9LtZaq0rr8BuY4gUo |
| CODEN | ITCSDI |
| CitedBy_id | crossref_primary_10_1016_j_neucom_2024_128700 crossref_primary_10_1109_JSSC_2021_3126625 crossref_primary_10_3390_electronics13081564 crossref_primary_10_1109_TVLSI_2023_3298509 |
| Cites_doi | 10.1145/3352460.3358275 10.1109/ISSCC.2017.7870353 10.1109/ISSCC.2019.8662302 10.1109/CVPR.2016.90 10.1016/j.patcog.2017.10.013 10.1145/3005348 10.1145/3079856.3080254 10.1016/j.neuroimage.2017.02.035 10.1109/CVPR.2009.5206848 10.5244/C.29.31 10.1016/j.neucom.2016.12.038 10.1109/JPROC.2017.2761740 10.1109/TCSVT.2017.2736553 10.1109/TPAMI.2017.2700390 10.1109/JSSC.2016.2616357 10.1145/3007787.3001138 10.1109/A-SSCC47793.2019.9056918 10.1109/VLSIC.2018.8502404 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TCAD.2020.2966451 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1937-4151 |
| EndPage | 5182 |
| ExternalDocumentID | 10_1109_TCAD_2020_2966451 8957699 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: NSFC grantid: 61934005; 61674094; 61720106013 funderid: 10.13039/501100001809 – fundername: National Key Research and Development Program of China; National Key Research and Development Program grantid: 2018YFA0701500 funderid: 10.13039/501100012166 – fundername: Beijing National Research Center for Information Science and Technology funderid: 10.13039/501100017582 – fundername: Beijing Innovation Center for Future Chip funderid: 10.13039/501100012282 |
| GroupedDBID | --Z -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PZZ RIA RIE RNS TN5 VH1 VJK AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c293t-1a3e3db511bc5eefb4d619e466c040181f1035e4d7707a5d921c19d4b0c2d9473 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000592111400068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0278-0070 |
| IngestDate | Sun Jun 29 16:17:07 EDT 2025 Sat Nov 29 01:40:42 EST 2025 Tue Nov 18 22:31:17 EST 2025 Wed Aug 27 02:28:32 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-1a3e3db511bc5eefb4d619e466c040181f1035e4d7707a5d921c19d4b0c2d9473 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-7160-4165 0000-0003-2421-353X 0000-0001-6346-6615 0000-0001-9873-6574 0000-0002-4892-2309 |
| PQID | 2462858293 |
| PQPubID | 85470 |
| PageCount | 13 |
| ParticipantIDs | crossref_primary_10_1109_TCAD_2020_2966451 crossref_citationtrail_10_1109_TCAD_2020_2966451 ieee_primary_8957699 proquest_journals_2462858293 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-12-01 |
| PublicationDateYYYYMMDD | 2020-12-01 |
| PublicationDate_xml | – month: 12 year: 2020 text: 2020-12-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on computer-aided design of integrated circuits and systems |
| PublicationTitleAbbrev | TCAD |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | lee (ref23) 2018 ref15 ref11 ref10 ye (ref25) 2019; abs 1903 9769 ref1 ref17 krizhevsky (ref2) 2012 ref16 ref19 ref18 wen (ref14) 2016 ref24 ref20 ref22 ref21 zhang (ref6) 2017 zhang (ref13) 2018 ref28 ref27 simonyan (ref3) 2014 howard (ref5) 2017 ref8 ref7 krizhevsky (ref26) 2009 ref9 ref4 han (ref12) 2015 |
| References_xml | – ident: ref24 doi: 10.1145/3352460.3358275 – ident: ref17 doi: 10.1109/ISSCC.2017.7870353 – ident: ref19 doi: 10.1109/ISSCC.2019.8662302 – volume: abs 1903 9769 year: 2019 ident: ref25 article-title: Progressive DNN compression: A key to achieve ultra-high weight pruning and quantization rates using ADMM publication-title: CoRR – year: 2015 ident: ref12 publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding – ident: ref4 doi: 10.1109/CVPR.2016.90 – year: 2014 ident: ref3 publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition – ident: ref10 doi: 10.1016/j.patcog.2017.10.013 – year: 2018 ident: ref13 publication-title: ADAM-ADMM A unified systematic framework of structured weight pruning for DNN s – ident: ref15 doi: 10.1145/3005348 – ident: ref20 doi: 10.1145/3079856.3080254 – start-page: 2074 year: 2016 ident: ref14 article-title: Learning structured sparsity in deep neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref9 doi: 10.1016/j.neuroimage.2017.02.035 – ident: ref27 doi: 10.1109/CVPR.2009.5206848 – ident: ref22 doi: 10.5244/C.29.31 – ident: ref1 doi: 10.1016/j.neucom.2016.12.038 – ident: ref11 doi: 10.1109/JPROC.2017.2761740 – ident: ref8 doi: 10.1109/TCSVT.2017.2736553 – start-page: 1097 year: 2012 ident: ref2 article-title: Imagenet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst – year: 2018 ident: ref23 article-title: Stitch-x: An accelerator architecture for exploiting unstructured sparsity in deep neural networks publication-title: Proc SysML Conf – ident: ref7 doi: 10.1109/TPAMI.2017.2700390 – ident: ref16 doi: 10.1109/JSSC.2016.2616357 – year: 2009 ident: ref26 article-title: Learning multiple layers of features from tiny images – year: 2017 ident: ref6 publication-title: Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks – ident: ref18 doi: 10.1145/3007787.3001138 – ident: ref28 doi: 10.1109/A-SSCC47793.2019.9056918 – ident: ref21 doi: 10.1109/VLSIC.2018.8502404 – year: 2017 ident: ref5 publication-title: Mobilenets Efficient convolutional neural networks for mobile vision applications |
| SSID | ssj0014529 |
| Score | 2.3307223 |
| Snippet | Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 5170 |
| SubjectTerms | Accelerators Algorithms Application specific integrated circuits Artificial neural networks Collision rates Complexity theory Computer architecture Convolution Energy efficiency Gallium arsenide Group associated architecture hash collision reduction Indexes Integrated circuits Kernel load-balancing algorithm Modules Performance degradation Performance enhancement scheduler module sparse convolutional neural network (CNN) accelerator Task analysis |
| Title | GAAS: An Efficient Group Associated Architecture and Scheduler Module for Sparse CNN Accelerators |
| URI | https://ieeexplore.ieee.org/document/8957699 https://www.proquest.com/docview/2462858293 |
| Volume | 39 |
| WOSCitedRecordID | wos000592111400068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1937-4151 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014529 issn: 0278-0070 databaseCode: RIE dateStart: 19820101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB5UPOjBt7i-yMGTWE3apNl4K-Lj4iKsgreSJlMQpLvsw9_vJNtdFEXw1BySoZ0v6bwyMwBnqaxT7fMq0V7JRCLZrMagTawV1qnM5ELUsdmE7vW6r6_maQkuFrkwiBgvn-FlGMZYvh-4aXCVXXUNacfGLMOy1vksV2sRMQgBxOhPCRVjaR-3EUzBzdUzfRRZgim_TEm5l0p8k0GxqcqPP3EUL3eb_3uxLdho1UhWzHDfhiVsdmD9S3HBXbD3RdG_ZkXDbmOZCKLAoqeJzSFBz4ovcQRmG8_6BKKfvuOIPQ7Ck5FSy_pDMn-R3fR6rHCOBFWMzY_34OXu9vnmIWkbKiSOpPokETbDzFekY1VOIdaV9GQ_ocxzR2eZZH0teKZQeq25tsqbVDhhvKy4S72ROtuHlWbQ4AEwJ2xdc04kiKM-VVXu66xrNek_qnZSd4DPWVy6ttp4aHrxXkarg5syoFIGVMoWlQ6cL5YMZ6U2_pq8G2BYTGwR6MDxHMeyPYzjMg35t6pLLDj8fdURrAXas1sqx7AyGU3xBFbdx-RtPDqN--wTux_OkA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7SpND2kLRJS7d5VIeeSpxIsmStejMhL5qYwm4hNyNLYygEb9hHf39HWu-S0hLIyT5Isq1P8nwzo5kB-CJVK00omswErTKFpLNaiy5zTjivc1sI0aZiE6aqhnd39scGHK9jYRAxHT7Dk3ibfPlh4hfRVHY6tMSOrX0BW1opyZfRWmufQXQhJotKzBlLK7n3YQpuT8f0WaQLSn4iid4rLf6SQqmsyj__4iRgLnae92pvYbsnkqxcIv8ONrDbhTeP0gvugbssy9E3VnbsPCWKoBFYsjWxFSgYWPnIk8BcF9iIYAyLe5yy20m8MqK1bPRACjCys6pipfckqpJ3fvYefl6cj8-usr6kQuZJrs8z4XLMQ0Msq_EasW1UIA0KVVF42s0k7VvBc40qGMON08FK4YUNquFeBqtM_gE2u0mHH4F54dqWcxqCZjRI3RShzYfOEAPSrVdmAHw1xbXv843Hshf3ddI7uK0jKnVEpe5RGcDXdZeHZbKNpxrvRRjWDXsEBnCwwrHut-OsljECVw9pCj79v9dneHU1vr2pb66r7_vwOj5neWblADbn0wUewkv_e_5rNj1Ka-4PpPTR1w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GAAS%3A+An+Efficient+Group+Associated+Architecture+and+Scheduler+Module+for+Sparse+CNN+Accelerators&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Wang%2C+Jingyu&rft.au=Yuan%2C+Zhe&rft.au=Liu%2C+Ruoyang&rft.au=Feng%2C+Xiaoyu&rft.date=2020-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0278-0070&rft.eissn=1937-4151&rft.volume=39&rft.issue=12&rft.spage=5170&rft_id=info:doi/10.1109%2FTCAD.2020.2966451&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon |