GAAS: An Efficient Group Associated Architecture and Scheduler Module for Sparse CNN Accelerators

Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 12; pp. 5170 - 5182
Main Authors: Wang, Jingyu, Yuan, Zhe, Liu, Ruoyang, Feng, Xiaoyu, Du, Li, Yang, Huazhong, Liu, Yongpan
Format: Journal Article
Language:English
Published: New York IEEE 01.12.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0278-0070, 1937-4151
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an n-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a 16×16 PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by 1.53×, 1.62×, 1.46×, and 1.55×, respectively.
AbstractList Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an [Formula Omitted]-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a [Formula Omitted] PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by [Formula Omitted], [Formula Omitted], [Formula Omitted], and [Formula Omitted], respectively.
Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to accelerate CNN on mobile platforms because of its tremendous energy efficiency and performance. Meanwhile, CNNs have become much sparser with the development of network pruning algorithms. Recent works have employed different methods to improve the energy efficiency and performance of ASIC accelerators by utilizing the sparsity character of CNN. However, some of these methods suffer from tremendous output memory overhead and performance degradation induced by hash collisions. To overcome the aforementioned problem, we propose GAAS: an efficient group associated architecture and scheduler module for sparse CNN accelerators. It achieves smaller output memory overhead and higher performance compared with the state-of-the-art accelerator. Our proposed method GAAS mainly consists of two parts: 1) an n-way group associated architecture to reduce the output memory overhead and 2) a scheduler module to improve the performance. Besides, a load-balancing algorithm is proposed and implemented in the scheduler module to improve the performance by reducing the hash collision rate. To demonstrate the efficiency of GAAS, we implement a 4-way image-principal associated architecture with a 16×16 PE array and the scheduler module on our proposed method. The experimental results on AlexNet, VGG16, ResNet18, and MobileNet show that GAAS can reduce the output memory overhead by 50%, and it can surely improve the performance of them by 1.53×, 1.62×, 1.46×, and 1.55×, respectively.
Author Wang, Jingyu
Liu, Yongpan
Feng, Xiaoyu
Yang, Huazhong
Yuan, Zhe
Du, Li
Liu, Ruoyang
Author_xml – sequence: 1
  givenname: Jingyu
  orcidid: 0000-0002-7160-4165
  surname: Wang
  fullname: Wang, Jingyu
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
– sequence: 2
  givenname: Zhe
  surname: Yuan
  fullname: Yuan, Zhe
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
– sequence: 3
  givenname: Ruoyang
  orcidid: 0000-0001-9873-6574
  surname: Liu
  fullname: Liu, Ruoyang
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
– sequence: 4
  givenname: Xiaoyu
  surname: Feng
  fullname: Feng, Xiaoyu
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
– sequence: 5
  givenname: Li
  orcidid: 0000-0001-6346-6615
  surname: Du
  fullname: Du, Li
  email: duli@bupt.edu.cn
  organization: School of Information and Communication Engineering and Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China
– sequence: 6
  givenname: Huazhong
  orcidid: 0000-0003-2421-353X
  surname: Yang
  fullname: Yang, Huazhong
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
– sequence: 7
  givenname: Yongpan
  orcidid: 0000-0002-4892-2309
  surname: Liu
  fullname: Liu, Yongpan
  email: ypliu@tsinghua.edu.cn
  organization: Department of Electronic Engineering, Tsinghua University, Beijing, China
BookMark eNp9kDtPwzAUhS0EEuXxAxCLJeYUX8eOY7aoQEHiMbTMkWvfqEYlLrYz8O9JVcTAwHSGe757pO-EHPahR0IugE0BmL5ezprbKWecTbmuKiHhgExAl6oQIOGQTBhXdcGYYsfkJKV3xkBIrifEzJtmcUObnt51nbce-0znMQxb2qQUrDcZHW2iXfuMNg8RqekdXdg1umGDkT6HXdIuRLrYmpiQzl5eaGMtjleTQ0xn5Kgzm4TnP3lK3u7vlrOH4ul1_jhrngrLdZkLMCWWbiUBVlYidivhKtAoqsoywaCGDlgpUTilmDLSaQ4WtBMrZrnTQpWn5Gr_dxvD54Apt-9hiP042XJR8VrW487Ygn3LxpBSxK7dRv9h4lcLrN2ZbHcm253J9sfkyKg_jPXZZB_6HI3f_Ete7kmPiL9LtZaq0rr8BuY4gUo
CODEN ITCSDI
CitedBy_id crossref_primary_10_1016_j_neucom_2024_128700
crossref_primary_10_1109_JSSC_2021_3126625
crossref_primary_10_3390_electronics13081564
crossref_primary_10_1109_TVLSI_2023_3298509
Cites_doi 10.1145/3352460.3358275
10.1109/ISSCC.2017.7870353
10.1109/ISSCC.2019.8662302
10.1109/CVPR.2016.90
10.1016/j.patcog.2017.10.013
10.1145/3005348
10.1145/3079856.3080254
10.1016/j.neuroimage.2017.02.035
10.1109/CVPR.2009.5206848
10.5244/C.29.31
10.1016/j.neucom.2016.12.038
10.1109/JPROC.2017.2761740
10.1109/TCSVT.2017.2736553
10.1109/TPAMI.2017.2700390
10.1109/JSSC.2016.2616357
10.1145/3007787.3001138
10.1109/A-SSCC47793.2019.9056918
10.1109/VLSIC.2018.8502404
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCAD.2020.2966451
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1937-4151
EndPage 5182
ExternalDocumentID 10_1109_TCAD_2020_2966451
8957699
Genre orig-research
GrantInformation_xml – fundername: NSFC
  grantid: 61934005; 61674094; 61720106013
  funderid: 10.13039/501100001809
– fundername: National Key Research and Development Program of China; National Key Research and Development Program
  grantid: 2018YFA0701500
  funderid: 10.13039/501100012166
– fundername: Beijing National Research Center for Information Science and Technology
  funderid: 10.13039/501100017582
– fundername: Beijing Innovation Center for Future Chip
  funderid: 10.13039/501100012282
GroupedDBID --Z
-~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
ACIWK
ACNCT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IBMZZ
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PZZ
RIA
RIE
RNS
TN5
VH1
VJK
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c293t-1a3e3db511bc5eefb4d619e466c040181f1035e4d7707a5d921c19d4b0c2d9473
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000592111400068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0278-0070
IngestDate Sun Jun 29 16:17:07 EDT 2025
Sat Nov 29 01:40:42 EST 2025
Tue Nov 18 22:31:17 EST 2025
Wed Aug 27 02:28:32 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-1a3e3db511bc5eefb4d619e466c040181f1035e4d7707a5d921c19d4b0c2d9473
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7160-4165
0000-0003-2421-353X
0000-0001-6346-6615
0000-0001-9873-6574
0000-0002-4892-2309
PQID 2462858293
PQPubID 85470
PageCount 13
ParticipantIDs crossref_primary_10_1109_TCAD_2020_2966451
crossref_citationtrail_10_1109_TCAD_2020_2966451
ieee_primary_8957699
proquest_journals_2462858293
PublicationCentury 2000
PublicationDate 2020-12-01
PublicationDateYYYYMMDD 2020-12-01
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-12-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on computer-aided design of integrated circuits and systems
PublicationTitleAbbrev TCAD
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References lee (ref23) 2018
ref15
ref11
ref10
ye (ref25) 2019; abs 1903 9769
ref1
ref17
krizhevsky (ref2) 2012
ref16
ref19
ref18
wen (ref14) 2016
ref24
ref20
ref22
ref21
zhang (ref6) 2017
zhang (ref13) 2018
ref28
ref27
simonyan (ref3) 2014
howard (ref5) 2017
ref8
ref7
krizhevsky (ref26) 2009
ref9
ref4
han (ref12) 2015
References_xml – ident: ref24
  doi: 10.1145/3352460.3358275
– ident: ref17
  doi: 10.1109/ISSCC.2017.7870353
– ident: ref19
  doi: 10.1109/ISSCC.2019.8662302
– volume: abs 1903 9769
  year: 2019
  ident: ref25
  article-title: Progressive DNN compression: A key to achieve ultra-high weight pruning and quantization rates using ADMM
  publication-title: CoRR
– year: 2015
  ident: ref12
  publication-title: Deep compression Compressing deep neural networks with pruning trained quantization and huffman coding
– ident: ref4
  doi: 10.1109/CVPR.2016.90
– year: 2014
  ident: ref3
  publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition
– ident: ref10
  doi: 10.1016/j.patcog.2017.10.013
– year: 2018
  ident: ref13
  publication-title: ADAM-ADMM A unified systematic framework of structured weight pruning for DNN s
– ident: ref15
  doi: 10.1145/3005348
– ident: ref20
  doi: 10.1145/3079856.3080254
– start-page: 2074
  year: 2016
  ident: ref14
  article-title: Learning structured sparsity in deep neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref9
  doi: 10.1016/j.neuroimage.2017.02.035
– ident: ref27
  doi: 10.1109/CVPR.2009.5206848
– ident: ref22
  doi: 10.5244/C.29.31
– ident: ref1
  doi: 10.1016/j.neucom.2016.12.038
– ident: ref11
  doi: 10.1109/JPROC.2017.2761740
– ident: ref8
  doi: 10.1109/TCSVT.2017.2736553
– start-page: 1097
  year: 2012
  ident: ref2
  article-title: Imagenet classification with deep convolutional neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– year: 2018
  ident: ref23
  article-title: Stitch-x: An accelerator architecture for exploiting unstructured sparsity in deep neural networks
  publication-title: Proc SysML Conf
– ident: ref7
  doi: 10.1109/TPAMI.2017.2700390
– ident: ref16
  doi: 10.1109/JSSC.2016.2616357
– year: 2009
  ident: ref26
  article-title: Learning multiple layers of features from tiny images
– year: 2017
  ident: ref6
  publication-title: Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
– ident: ref18
  doi: 10.1145/3007787.3001138
– ident: ref28
  doi: 10.1109/A-SSCC47793.2019.9056918
– ident: ref21
  doi: 10.1109/VLSIC.2018.8502404
– year: 2017
  ident: ref5
  publication-title: Mobilenets Efficient convolutional neural networks for mobile vision applications
SSID ssj0014529
Score 2.3306296
Snippet Convolutional neural networks (CNNs) have become powerful algorithms in various tasks. Application-specific integrated circuit (ASIC) has been widely used to...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 5170
SubjectTerms Accelerators
Algorithms
Application specific integrated circuits
Artificial neural networks
Collision rates
Complexity theory
Computer architecture
Convolution
Energy efficiency
Gallium arsenide
Group associated architecture
hash collision reduction
Indexes
Integrated circuits
Kernel
load-balancing algorithm
Modules
Performance degradation
Performance enhancement
scheduler module
sparse convolutional neural network (CNN) accelerator
Task analysis
Title GAAS: An Efficient Group Associated Architecture and Scheduler Module for Sparse CNN Accelerators
URI https://ieeexplore.ieee.org/document/8957699
https://www.proquest.com/docview/2462858293
Volume 39
WOSCitedRecordID wos000592111400068&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1937-4151
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014529
  issn: 0278-0070
  databaseCode: RIE
  dateStart: 19820101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH_M4UEPfovTKTl4Ejv7kTaNtyKbHnQIU9itNMkbCKMb-_Dv9yXtxkQRPDWHJLT5JX2_l1_eC8C1SgqdEnHwkIfc40pafVehF6qEuJxIfa1cEtdn0e-nw6F8bcDtOhYGEd3hM-zYotPyzUQv7VbZXSqJHUu5BVtCJFWs1loxsAKi20-xGWNpHtcKZuDLuzf6KPIEQ78TErnncfDNBrlLVX78iZ156e3_78UOYK-mkSyrcD-EBpZHsLuRXPAYiscsG9yzrGRdlyaCemBup4mtIEHDsg0dgRWlYQMC0SzHOGMvE_tkRGrZYEruL7KHfp9lWpOhctr8_ATee923hyevvlDB02TVF15QRBgZRRxL6RhxpLgh_wl5kmhay2TrR4EfxciNEL4oYiPDQAfScOXr0EguolNolpMSz4AZpOVbJLEoUs6FFspEiYwwLMjltcG5LfBXQ5zrOtu4vfRinDuvw5e5RSW3qOQ1Ki24WTeZVqk2_qp8bGFYV6wRaEF7hWNeL8Z5Htr42zilITj_vdUF7Ni-q1MqbWguZku8hG39ufiYz67cPPsCDPjNag
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS-QwFD64urD6oK6uOF7zsE9iNU3TpvGtiDccizAj-Faa5AgL0pG5-Ps9yXQGF0XwqX1I2iZf0vOdfDknAH9NVtuciEOEUshIGu31XYORMBlxOZVza0IS164qy_zxUd8vwPE8FgYRw-YzPPG3Qct3AzvxS2WnuSZ2rPUPWEqlFHwarTXXDLyEGFZUfM5YGsmthhlzfdqnZpEvKPiJIHov0_g_KxSOVfnwLw4G5nLte5-2DqstkWTFFPnfsIDNBqy8Sy-4CfVVUfTOWNGwi5Aogp7AwloTm4GCjhXvlARWN471CEY3ecYhuxv4KyNay3ov5AAjOy9LVlhLpiqo86M_8HB50T-_jtojFSJLdn0cxXWCiTPEsoxNEZ-MdORBocwyS7OZrP1TzJMUpVOKqzp1WsQ21k4aboXTUiVbsNgMGtwG5pAmcJ2lqs6lVFYZl2Q6QVGT0-vDczvAZ11c2TbfuD_24rkKfgfXlUel8qhULSodOJpXeZkm2_iq8KaHYV6wRaADezMcq3Y6jirhI3DTnLpg5_Nah_Drun_Xrbo35e0uLPv3TPes7MHieDjBffhpX8f_RsODMObeAOg60LE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GAAS%3A+An+Efficient+Group+Associated+Architecture+and+Scheduler+Module+for+Sparse+CNN+Accelerators&rft.jtitle=IEEE+transactions+on+computer-aided+design+of+integrated+circuits+and+systems&rft.au=Wang%2C+Jingyu&rft.au=Yuan%2C+Zhe&rft.au=Liu%2C+Ruoyang&rft.au=Feng%2C+Xiaoyu&rft.date=2020-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0278-0070&rft.eissn=1937-4151&rft.volume=39&rft.issue=12&rft.spage=5170&rft_id=info:doi/10.1109%2FTCAD.2020.2966451&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0278-0070&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0278-0070&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0278-0070&client=summon