An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization

Transformers have achieved significant success in deep learning, and training Transformers efficiently on resource-constrained platforms has been attracting continuous attention for domain adaptions and privacy concerns. However, deploying Transformers training on these platforms is still challengin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on very large scale integration (VLSI) systems Jg. 31; H. 11; S. 1788 - 1801
Hauptverfasser: Shao, Haikuo, Lu, Jinming, Wang, Meiqi, Wang, Zhongfeng
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1063-8210, 1557-9999
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Transformers have achieved significant success in deep learning, and training Transformers efficiently on resource-constrained platforms has been attracting continuous attention for domain adaptions and privacy concerns. However, deploying Transformers training on these platforms is still challenging due to its dynamic workloads, intensive computations, and massive memory accesses. To address these issues, we propose an Efficient Training Accelerator for TRansformers (TRETA) through a hardware-algorithm co-optimization strategy. First, a hardware-friendly mixed-precision training algorithm is presented based on a compact and efficient data format, which significantly reduces the computation and memory requirements. Second, a flexible and scalable architecture is proposed to achieve high utilization of computing resources when processing arbitrary irregular general matrix multiplication (GEMM) operations during training. These irregular GEMMs lead to severe under-utilization when simply mapped on traditional systolic architectures. Third, we develop training-oriented architectures for the crucial Softmax and layer normalization functions in Transformers, respectively. These area-efficient modules have unified and flexible microarchitectures to meet various computation requirements of different training phases. Finally, TRETA is implemented under Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology and evaluated on multiple benchmarks. The experimental results show that our training framework achieves the same accuracy as the full precision baseline. Moreover, TRETA can achieve 14.71 tera operations per second (TOPS) and 3.31 TOPS/W in terms of throughput and energy efficiency, respectively. Compared with prior arts, the proposed design shows 1.4-<inline-formula> <tex-math notation="LaTeX">24.5\times </tex-math></inline-formula> speedup and 1.5-<inline-formula> <tex-math notation="LaTeX">25.4\times </tex-math></inline-formula> energy efficiency improvement.
AbstractList Transformers have achieved significant success in deep learning, and training Transformers efficiently on resource-constrained platforms has been attracting continuous attention for domain adaptions and privacy concerns. However, deploying Transformers training on these platforms is still challenging due to its dynamic workloads, intensive computations, and massive memory accesses. To address these issues, we propose an Efficient Training Accelerator for TRansformers (TRETA) through a hardware-algorithm co-optimization strategy. First, a hardware-friendly mixed-precision training algorithm is presented based on a compact and efficient data format, which significantly reduces the computation and memory requirements. Second, a flexible and scalable architecture is proposed to achieve high utilization of computing resources when processing arbitrary irregular general matrix multiplication (GEMM) operations during training. These irregular GEMMs lead to severe under-utilization when simply mapped on traditional systolic architectures. Third, we develop training-oriented architectures for the crucial Softmax and layer normalization functions in Transformers, respectively. These area-efficient modules have unified and flexible microarchitectures to meet various computation requirements of different training phases. Finally, TRETA is implemented under Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology and evaluated on multiple benchmarks. The experimental results show that our training framework achieves the same accuracy as the full precision baseline. Moreover, TRETA can achieve 14.71 tera operations per second (TOPS) and 3.31 TOPS/W in terms of throughput and energy efficiency, respectively. Compared with prior arts, the proposed design shows 1.4–[Formula Omitted] speedup and 1.5–[Formula Omitted] energy efficiency improvement.
Transformers have achieved significant success in deep learning, and training Transformers efficiently on resource-constrained platforms has been attracting continuous attention for domain adaptions and privacy concerns. However, deploying Transformers training on these platforms is still challenging due to its dynamic workloads, intensive computations, and massive memory accesses. To address these issues, we propose an Efficient Training Accelerator for TRansformers (TRETA) through a hardware-algorithm co-optimization strategy. First, a hardware-friendly mixed-precision training algorithm is presented based on a compact and efficient data format, which significantly reduces the computation and memory requirements. Second, a flexible and scalable architecture is proposed to achieve high utilization of computing resources when processing arbitrary irregular general matrix multiplication (GEMM) operations during training. These irregular GEMMs lead to severe under-utilization when simply mapped on traditional systolic architectures. Third, we develop training-oriented architectures for the crucial Softmax and layer normalization functions in Transformers, respectively. These area-efficient modules have unified and flexible microarchitectures to meet various computation requirements of different training phases. Finally, TRETA is implemented under Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology and evaluated on multiple benchmarks. The experimental results show that our training framework achieves the same accuracy as the full precision baseline. Moreover, TRETA can achieve 14.71 tera operations per second (TOPS) and 3.31 TOPS/W in terms of throughput and energy efficiency, respectively. Compared with prior arts, the proposed design shows 1.4-<inline-formula> <tex-math notation="LaTeX">24.5\times </tex-math></inline-formula> speedup and 1.5-<inline-formula> <tex-math notation="LaTeX">25.4\times </tex-math></inline-formula> energy efficiency improvement.
Author Wang, Zhongfeng
Wang, Meiqi
Shao, Haikuo
Lu, Jinming
Author_xml – sequence: 1
  givenname: Haikuo
  orcidid: 0009-0008-6965-3436
  surname: Shao
  fullname: Shao, Haikuo
  email: hkshao@smail.nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
– sequence: 2
  givenname: Jinming
  orcidid: 0000-0002-7134-6514
  surname: Lu
  fullname: Lu, Jinming
  email: jmlu@smail.nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
– sequence: 3
  givenname: Meiqi
  orcidid: 0000-0001-9553-3640
  surname: Wang
  fullname: Wang, Meiqi
  email: wangmq53@mail.sysu.edu.cn
  organization: School of Integrated Circuits, Sun Yat-sen University, Shenzhen, China
– sequence: 4
  givenname: Zhongfeng
  orcidid: 0000-0002-7227-4786
  surname: Wang
  fullname: Wang, Zhongfeng
  email: zfwang@nju.edu.cn
  organization: School of Electronic Science and Engineering, Nanjing University, Nanjing, China
BookMark eNp9kE1LAzEQhoNUsK3-AfGw4HnrJNlss8dSqi0UPFgVvCzZ7KSmtNmaTRH99aYfB_HgQJhJMs98vD3ScY1DQq4pDCiF4m7xMn-aDRgwPuAchMiLM9KlQgzTIlonxpDzVDIKF6TXtisAmmUFdMnbyCUTY6y26EKy8Mo665bJSGtco1eh8YmJJ364NgYb9G3yasN7MlW-_lQe09F62fj4sknGTfq4DXZjv1Wwjbsk50atW7w6-T55vp8sxtN0_vgwG4_mqWZFHtK6qJUaGuBIMwlFDSwTGfIqA6qlNFUVLxwzBqoqDMtzJpCCMJrLvKo11rxPbo91t7752GEbylWz8y62LJkcSgGSizxmyWOW9k3bejSltuEwZ4hLr0sK5V7J8qBkuVeyPCkZUfYH3Xq7Uf7rf-jmCFlE_AUwQWlO-Q_qPoJt
CODEN IEVSE9
CitedBy_id crossref_primary_10_1109_JETCAS_2025_3555970
crossref_primary_10_1109_JETCAS_2025_3575272
crossref_primary_10_1109_TVLSI_2025_3552534
crossref_primary_10_1109_TVLSI_2025_3553069
crossref_primary_10_1109_TCSII_2025_3591633
crossref_primary_10_1109_TVLSI_2024_3432403
crossref_primary_10_1109_TVLSI_2025_3561000
Cites_doi 10.1109/TCSI.2020.3021397
10.1109/JSSC.2023.3234893
10.1109/CVPR42600.2020.00807
10.1109/CVPR.2016.90
10.1609/aaai.v35i4.16462
10.1109/HPCA47549.2020.00015
10.18653/v1/W18-6313
10.1109/TPDS.2022.3149787
10.1109/VLSIC.2018.8502276
10.1109/ISPASS48437.2020.00016
10.1109/ISSCC.2019.8662302
10.1007/s11263-021-01453-z
10.1109/CVPR42600.2020.00204
10.1109/JSSC.2021.3120113
10.1109/5.726791
10.1145/3079856.3080246
10.1109/ICASSP39728.2021.9413535
10.1109/ICCV48922.2021.00986
10.1109/CVPR.2009.5206848
10.1109/VLSICircuits18222.2020.9162917
10.21236/ADA273556
10.1109/ISCA52012.2021.00061
10.1109/ICASSP.2018.8462506
10.1109/OJSSCS.2021.3119554
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TVLSI.2023.3305569
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 1801
ExternalDocumentID 10_1109_TVLSI_2023_3305569
10251161
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62174084
  funderid: 10.13039/501100001809
– fundername: National Key Research and Development Program of China
  grantid: 2022YFB4400604
  funderid: 10.13039/501100012166
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c296t-d9daa7f03e14809d02454e3b401c88fbb4e33e420ab9f26625e105fc386bdced3
IEDL.DBID RIE
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001068976200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-8210
IngestDate Sun Nov 09 08:55:55 EST 2025
Tue Nov 18 20:53:12 EST 2025
Sat Nov 29 03:36:21 EST 2025
Wed Aug 27 02:34:55 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c296t-d9daa7f03e14809d02454e3b401c88fbb4e33e420ab9f26625e105fc386bdced3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-9553-3640
0009-0008-6965-3436
0000-0002-7134-6514
0000-0002-7227-4786
PQID 2878508356
PQPubID 85424
PageCount 14
ParticipantIDs proquest_journals_2878508356
crossref_citationtrail_10_1109_TVLSI_2023_3305569
crossref_primary_10_1109_TVLSI_2023_3305569
ieee_primary_10251161
PublicationCentury 2000
PublicationDate 2023-11-01
PublicationDateYYYYMMDD 2023-11-01
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref12
ref34
ref15
ref37
ref36
ref33
wang (ref22) 2022; 65
ref10
lan (ref4) 2019
radford (ref5) 2019; 1
ref39
ref19
ref18
han (ref16) 2015
nagel (ref17) 2021
parmar (ref8) 2018
ref24
ref45
ref26
ref25
ref20
brown (ref14) 2020
ref41
micikevicius (ref23) 2017
ref21
lu (ref30) 2022
liu (ref3) 2019
ref28
ref27
devlin (ref2) 2019; abs 1810
ref29
drumond (ref31) 2018
ref7
noh (ref38) 2022
ref9
dosovitskiy (ref6) 2020
vaswani (ref1) 2017
krizhevsky (ref44) 2009
huang (ref11) 2018
radford (ref13) 2018
kalamkar (ref32) 2019
ref40
merity (ref42) 2016
hill (ref43) 2015
References_xml – ident: ref20
  doi: 10.1109/TCSI.2020.3021397
– year: 2018
  ident: ref11
  article-title: Music transformer
  publication-title: arXiv 1809 04281
– ident: ref21
  doi: 10.1109/JSSC.2023.3234893
– ident: ref19
  doi: 10.1109/CVPR42600.2020.00807
– year: 2009
  ident: ref44
  publication-title: Learning multiple layers of features from tiny images
– ident: ref15
  doi: 10.1109/CVPR.2016.90
– year: 2022
  ident: ref30
  article-title: ETA: An efficient training accelerator for DNNs based on hardware-algorithm co-optimization
  publication-title: IEEE Trans Neural Netw Learn Syst
– ident: ref33
  doi: 10.1609/aaai.v35i4.16462
– year: 2019
  ident: ref4
  article-title: ALBERT: A lite BERT for self-supervised learning of language representations
  publication-title: arXiv 1909 11942
– ident: ref36
  doi: 10.1109/HPCA47549.2020.00015
– ident: ref39
  doi: 10.18653/v1/W18-6313
– year: 2015
  ident: ref43
  article-title: The Goldilocks principle: Reading children's books with explicit memory representations
  publication-title: arXiv 1511 02301
– ident: ref27
  doi: 10.1109/TPDS.2022.3149787
– ident: ref25
  doi: 10.1109/VLSIC.2018.8502276
– ident: ref40
  doi: 10.1109/ISPASS48437.2020.00016
– ident: ref26
  doi: 10.1109/ISSCC.2019.8662302
– ident: ref18
  doi: 10.1007/s11263-021-01453-z
– ident: ref24
  doi: 10.1109/CVPR42600.2020.00204
– ident: ref35
  doi: 10.1109/JSSC.2021.3120113
– year: 2020
  ident: ref14
  article-title: Language models are few-shot learners
  publication-title: arXiv 2005 14165
– start-page: 1
  year: 2018
  ident: ref31
  article-title: Training DNNs with hybrid block floating point
  publication-title: Proc Adv Neural Inf Process Syst
– year: 2018
  ident: ref13
  publication-title: Improving language understanding by generative pre-training
– ident: ref29
  doi: 10.1109/5.726791
– volume: abs 1810
  year: 2019
  ident: ref2
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
  publication-title: ArXiv
– ident: ref28
  doi: 10.1145/3079856.3080246
– year: 2019
  ident: ref3
  article-title: RoBERTa: A robustly optimized BERT pretraining approach
  publication-title: arXiv 1907 11692
– ident: ref10
  doi: 10.1109/ICASSP39728.2021.9413535
– ident: ref7
  doi: 10.1109/ICCV48922.2021.00986
– year: 2020
  ident: ref6
  article-title: An image is worth 16×16 words: Transformers for image recognition at scale
  publication-title: arXiv 2010 11929
– ident: ref45
  doi: 10.1109/CVPR.2009.5206848
– year: 2021
  ident: ref17
  article-title: A white paper on neural network quantization
  publication-title: arXiv 2106 08295
– volume: 65
  start-page: 1
  year: 2022
  ident: ref22
  article-title: A 28 nm 27.5TOPS/W approximate-computing-based transformer processor with asymptotic sparsity speculating and out-of-order computing
  publication-title: IEEE Int Solid-State Circuits Conf (ISSCC) Dig Tech Papers
– year: 2022
  ident: ref38
  article-title: FlexBlock: A flexible DNN training accelerator with multi-mode block floating point support
  publication-title: arXiv 2203 06673
– start-page: 1
  year: 2015
  ident: ref16
  article-title: Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding
  publication-title: Proc Int Conf Learn Represent
– year: 2019
  ident: ref32
  article-title: A study of BFLOAT16 for deep learning training
  publication-title: arXiv 1905 12322
– ident: ref34
  doi: 10.1109/VLSICircuits18222.2020.9162917
– ident: ref41
  doi: 10.21236/ADA273556
– year: 2017
  ident: ref1
  article-title: Attention is all you need
  publication-title: Proc NIPS
– ident: ref37
  doi: 10.1109/ISCA52012.2021.00061
– year: 2016
  ident: ref42
  article-title: Pointer sentinel mixture models
  publication-title: arXiv 1609 07843
– ident: ref9
  doi: 10.1109/ICASSP.2018.8462506
– start-page: 4055
  year: 2018
  ident: ref8
  article-title: Image transformer
  publication-title: Proc 35th Int Conf Mach Learn (ICML)
– ident: ref12
  doi: 10.1109/OJSSCS.2021.3119554
– year: 2017
  ident: ref23
  article-title: Mixed precision training
  publication-title: arXiv 1710 03740
– volume: 1
  start-page: 9
  year: 2019
  ident: ref5
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAIRE blog
SSID ssj0014490
Score 2.4822738
Snippet Transformers have achieved significant success in deep learning, and training Transformers efficiently on resource-constrained platforms has been attracting...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1788
SubjectTerms Algorithm-hardware codesign
Algorithms
Computational modeling
Computer architecture
Computer memory
Data models
Energy efficiency
general matrix multiplication (GEMM)
Hardware
Memory management
nonlinear function
Optimization
Platforms
Task analysis
Technology assessment
Training
training accelerator
Transformer
Transformers
Title An Efficient Training Accelerator for Transformers With Hardware-Algorithm Co-Optimization
URI https://ieeexplore.ieee.org/document/10251161
https://www.proquest.com/docview/2878508356
Volume 31
WOSCitedRecordID wos001068976200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1557-9999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014490
  issn: 1063-8210
  databaseCode: RIE
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA46POjB3-J0Sg7eJLNr2iY5lrGhIFNw6vBSmuRVB1sr-6H_vknajYkoeGtLXil5Td73-vp9D6ELyUOgClok40oTi6CJlC1JqA9pSwrONaSu2QTr9fhgIO4rsrrjwgCA-_kMmvbQ1fJ1oeb2U5lZ4RYQ22RnnbGoJGstSwZBIErpgYgSbhKZBUPGE1f9p9uHm6ZtFN6kThJLfItCrq3Kj73YBZjuzj8fbRdtV0gSx6Xr99Aa5Ptoa0Vf8AC9xDnuOI0IY4z7VTcIHCtlgo2rr2ODWXF_AV4NFMTPw9kbtvX8z3QCJB69FhNzZYzbBbkz28u44m0eosdup9--JlUzBaJ8Ec2IFjpNWeZRMAmQJ7QtuQZApcmvFOeZlOaEQuB7qRSZidp-CAZ6ZYrySGoFmh6hWl7kcIxw5EEKzCCdQMuAhUxSZmBlpv2AQZgFfh21FpObqEpp3Da8GCUu4_BE4hySWIcklUPq6HJp817qbPw5-tC6YGVkOft11Fg4ManW4jQxOSG3ovdhdPKL2SnatHcvKYYNVJtN5nCGNtTHbDidnLvX7AvUmM_u
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA6igvrgb3E6NQ--SWbXpG3yOGRDcU7B-gNfSpNcVdBN5tR_30vayUQUfGtLQstdk_u-Xu87Qva1jIAbaLJCGsscgmZaNzXjIeRNraS0kPtmE0mvJ29v1UVVrO5rYQDA_3wGDXfoc_l2YN7cpzJc4Q4QO7IzEwkRBmW51lfSQAhVig_EnEmkMuMamUAdptfdy5OGaxXe4F4US32LQ76xyo_d2IeYztI_H26ZLFZYkrZK56-QKeivkoUJhcE1ctfq07ZXicDJNK36QdCWMRhufIadImql6Ri-IhikN4-jB-oy-h_5EFjr6X4wxCvP9GjAznGDea4qN9fJVaedHh2zqp0CM6GKR8wqm-dJEXBAChQo65KuArhGhmWkLLTGEw5o1lyrAuN2GAGCr8JwGWtrwPINMt0f9GGT0DiAHBLEOsJqkUSJ5gkCy8KGIoGoEGGNNMfGzUylNe5aXjxlnnMEKvMOyZxDssohNXLwNeelVNr4c_S6c8HEyNL6NVIfOzGrVuNrhqxQOtn7KN76ZdoemTtOz7pZ96R3uk3m3Z3KgsM6mR4N32CHzJr30ePrcNe_cp_P6NM1
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Efficient+Training+Accelerator+for+Transformers+With+Hardware-Algorithm+Co-Optimization&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Shao%2C+Haikuo&rft.au=Lu%2C+Jinming&rft.au=Wang%2C+Meiqi&rft.au=Wang%2C+Zhongfeng&rft.date=2023-11-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=31&rft.issue=11&rft.spage=1788&rft_id=info:doi/10.1109%2FTVLSI.2023.3305569&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon