Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architecture...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems. I, Regular papers Vol. 70; no. 6; pp. 2450 - 2463
Main Authors: Ottavi, Gianmarco, Garofalo, Angelo, Tagliavini, Giuseppe, Conti, Francesco, Mauro, Alfio Di, Benini, Luca, Rossi, Davide
Format: Journal Article
Language:English
Published: New York IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1549-8328, 1558-0806
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W.
AbstractList Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W.
Author Garofalo, Angelo
Mauro, Alfio Di
Benini, Luca
Rossi, Davide
Tagliavini, Giuseppe
Ottavi, Gianmarco
Conti, Francesco
Author_xml – sequence: 1
  givenname: Gianmarco
  orcidid: 0000-0003-0041-7917
  surname: Ottavi
  fullname: Ottavi, Gianmarco
  email: gianmarco.ottavi2@unibo.it
  organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy
– sequence: 2
  givenname: Angelo
  orcidid: 0000-0002-7495-6895
  surname: Garofalo
  fullname: Garofalo, Angelo
  organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy
– sequence: 3
  givenname: Giuseppe
  orcidid: 0000-0002-9221-4633
  surname: Tagliavini
  fullname: Tagliavini, Giuseppe
  organization: Department of Computer Science and Engineering (DISI), University of Bologna, Bologna, Italy
– sequence: 4
  givenname: Francesco
  orcidid: 0000-0002-7924-933X
  surname: Conti
  fullname: Conti, Francesco
  organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy
– sequence: 5
  givenname: Alfio Di
  orcidid: 0000-0001-6688-1603
  surname: Mauro
  fullname: Mauro, Alfio Di
  organization: IIS Integrated Systems Laboratory, ETH Züric, Züric, Switzerland
– sequence: 6
  givenname: Luca
  orcidid: 0000-0001-8068-3806
  surname: Benini
  fullname: Benini, Luca
  organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy
– sequence: 7
  givenname: Davide
  orcidid: 0000-0002-0651-5393
  surname: Rossi
  fullname: Rossi, Davide
  organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy
BookMark eNp9kE1P3DAQhq0KpC6UH1CpB0s9e-uPje30BoEtSFuxEtAeI8eZqKZuvNiOlr3yy0m0e0A9cJqR5n1mRs8JOupDDwh9ZnTOGC2_3Vd3N3NOuZgLXiw0ox_QjBWFJlRTeTT1i5JowfVHdJLSI6W8pILN0MvlkLLrv-NzzCSpQoSE1yYa78HjB5-jIauwJeuwhYgrP4bH-tvlP5g3JAcieIOXg_c7vPTw7BoP-MJlso5gXXKhx6Zv8S-wOUS8CvbvyG_w1TPYIU_Tn6GFT-i4Mz7B2aGeoofl1X11TVa3P26q8xWxQqhMGG10qa0A2S4a2jGlhC5BaW6VKTtoTavA0qKVtOCtLTh0UigrFZUdk1y24hR93e_dxPA0QMr1YxhiP56suWblolSFUGNK7VM2hpQidLV12UzPji6crxmtJ-H1JLyehNcH4SPJ_iM30f0zcfcu82XPOAB4k6eKKV6IVxK-jQ4
CODEN ITCSCH
CitedBy_id crossref_primary_10_1145_3768630
crossref_primary_10_1109_ACCESS_2024_3380472
crossref_primary_10_1109_ACCESS_2024_3401831
crossref_primary_10_1109_TVLSI_2024_3466224
crossref_primary_10_1016_j_vlsi_2024_102282
crossref_primary_10_1145_3729215
crossref_primary_10_1109_ACCESS_2025_3582013
crossref_primary_10_3390_jlpea13010005
Cites_doi 10.1145/3387902.3394038
10.7873/DATE.2013.090
10.1109/JSSC.2019.2912307
10.1109/JSSC.2021.3056219
10.1109/CVPR.2019.00881
10.1109/CVPR42600.2020.00242
10.1109/DATE.2012.6176639
10.1109/TETC.2021.3072337
10.1109/TCSII.2020.2983648
10.1109/ISSCC19947.2020.9062989
10.1098/rsta.2019.0155
10.23919/DATE51398.2021.9474087
10.1109/JSSC.2021.3114881
10.1109/DAC.2018.8465915
10.23919/VLSIC.2017.8008534
10.1109/TPDS.2020.3028691
10.1109/ICCV.2019.00038
10.1109/MWSCAS.2010.5548579
10.1109/JETCAS.2019.2910232
10.1109/ISSCC.2018.8310262
10.1109/CVPR.2018.00474
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TCSI.2023.3254810
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-0806
EndPage 2463
ExternalDocumentID 10_1109_TCSI_2023_3254810
10071725
Genre orig-research
GrantInformation_xml – fundername: European Commission Horizon 2020 Framework through The European Pilot project
  grantid: 101034126
  funderid: 10.13039/501100000780
– fundername: WiPLASH project
  grantid: 863337
– fundername: CSEL Joint Undertaking Horizon 2020 through the AI4DI
  grantid: 826060
– fundername: GreenWaves Technologies
GroupedDBID 0R~
29I
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
AETIX
AGQYO
AGSQL
AHBIQ
AIBXA
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
H~9
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PZZ
RIA
RIE
RNS
VJK
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c337t-10b898c3e6d4b0f177389e782c7a9fedad7ec05d6052dc52ef637c6706f1626d3
IEDL.DBID RIE
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000953712700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1549-8328
IngestDate Mon Jun 30 08:33:21 EDT 2025
Sat Nov 29 06:23:57 EST 2025
Tue Nov 18 19:50:36 EST 2025
Wed Aug 27 02:18:06 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c337t-10b898c3e6d4b0f177389e782c7a9fedad7ec05d6052dc52ef637c6706f1626d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-0041-7917
0000-0001-6688-1603
0000-0001-8068-3806
0000-0002-9221-4633
0000-0002-7924-933X
0000-0002-0651-5393
0000-0002-7495-6895
OpenAccessLink https://ieeexplore.ieee.org/document/10071725
PQID 2819497537
PQPubID 85411
PageCount 14
ParticipantIDs ieee_primary_10071725
crossref_citationtrail_10_1109_TCSI_2023_3254810
crossref_primary_10_1109_TCSI_2023_3254810
proquest_journals_2819497537
PublicationCentury 2000
PublicationDate 2023-06-01
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-06-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems. I, Regular papers
PublicationTitleAbbrev TCSI
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ref34
ref15
choi (ref18) 2018
ref31
ref11
miro-panades (ref33) 2020
naumov (ref14) 2018
ref2
lai (ref6) 2018
banbury (ref30) 2021
van baalen (ref1) 2020
moons (ref19) 2017
wu (ref13) 2018
howard (ref10) 2017
ref24
ref26
ref25
ref20
ref22
yao (ref17) 2021
ref21
ref28
ref27
(ref7) 2022
dong (ref16) 2020; 33
ref29
ref8
bol (ref32) 2021; 56
ref9
ref4
ref3
ref5
(ref23) 2020
References_xml – volume: 33
  start-page: 18518
  year: 2020
  ident: ref16
  article-title: HAWQ-V2: Hessian aware trace-weighted quantization of neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– year: 2018
  ident: ref18
  article-title: Bridging the accuracy gap for 2-bit quantized neural networks (QNN)
  publication-title: arXiv 1807 06964
– ident: ref9
  doi: 10.1145/3387902.3394038
– year: 2021
  ident: ref30
  article-title: MLPerf tiny benchmark
  publication-title: arXiv 2106 07597
– ident: ref25
  doi: 10.7873/DATE.2013.090
– year: 2022
  ident: ref7
  publication-title: ARM Helium
– ident: ref34
  doi: 10.1109/JSSC.2019.2912307
– volume: 56
  start-page: 2256
  year: 2021
  ident: ref32
  article-title: SleepRunner: A 28-nm FDSOI ULP cortex-M0 MCU with ULL SRAM and UFBR PVT compensation for 2.6-3.6-? W/DMIPS 40-80-MHz active mode and 131-nW/kB fully retentive deep-sleep mode
  publication-title: IEEE J Solid-State Circuits
  doi: 10.1109/JSSC.2021.3056219
– ident: ref12
  doi: 10.1109/CVPR.2019.00881
– start-page: 1
  year: 2020
  ident: ref33
  article-title: SamurAI: A 1.7MOPS-36GOPS adaptive versatile IoT node with 15,000? peak-to-idle power reduction, 207ns wake-up time and 1.3TOPS/W ML efficiency
  publication-title: Proc IEEE Symp VLSI Circuits
– ident: ref2
  doi: 10.1109/CVPR42600.2020.00242
– ident: ref27
  doi: 10.1109/DATE.2012.6176639
– year: 2018
  ident: ref14
  article-title: On periodic functions as regularizers for quantization of neural networks
  publication-title: arXiv 1811 09862
– start-page: 246
  year: 2017
  ident: ref19
  article-title: 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm FDSOI
  publication-title: IEEE Int Solid-State Circuits Conf (ISSCC) Dig Tech Papers
– year: 2018
  ident: ref6
  article-title: CMSIS-NN: Efficient neural network kernels for arm Cortex-M CPUs
  publication-title: arXiv 1801 06601 [cs]
– start-page: 11875
  year: 2021
  ident: ref17
  article-title: HAWQ-V3: Dyadic neural network quantization
  publication-title: Proc Int Conf Mach Learn
– ident: ref8
  doi: 10.1109/TETC.2021.3072337
– ident: ref31
  doi: 10.1109/TCSII.2020.2983648
– ident: ref3
  doi: 10.1109/ISSCC19947.2020.9062989
– ident: ref5
  doi: 10.1098/rsta.2019.0155
– ident: ref26
  doi: 10.23919/DATE51398.2021.9474087
– ident: ref24
  doi: 10.1109/JSSC.2021.3114881
– ident: ref21
  doi: 10.1109/DAC.2018.8465915
– ident: ref20
  doi: 10.23919/VLSIC.2017.8008534
– ident: ref29
  doi: 10.1109/TPDS.2020.3028691
– year: 2017
  ident: ref10
  article-title: MobileNets: Efficient convolutional neural networks for mobile vision applications
  publication-title: arXiv 1704 04861
– year: 2020
  ident: ref1
  article-title: Bayesian bits: Unifying quantization and pruning
  publication-title: arXiv 2005 07093
– ident: ref15
  doi: 10.1109/ICCV.2019.00038
– year: 2020
  ident: ref23
  publication-title: Lattice sensAI Delivers 10X Performance Boost for Low-Power Smart IoT Devices at the Edge
– ident: ref28
  doi: 10.1109/MWSCAS.2010.5548579
– ident: ref22
  doi: 10.1109/JETCAS.2019.2910232
– ident: ref4
  doi: 10.1109/ISSCC.2018.8310262
– year: 2018
  ident: ref13
  article-title: Mixed precision quantization of ConvNets via differentiable neural architecture search
  publication-title: arXiv 1812 00090
– ident: ref11
  doi: 10.1109/CVPR.2018.00474
SSID ssj0029031
Score 2.5129993
Snippet Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2450
SubjectTerms Algorithms
Arithmetic
Artificial neural networks
Clusters
Computer architecture
Electronic devices
Energy efficiency
Field programmable gate arrays
Hardware
Microprocessors
MIMD
mixed-precision
Power consumption
Power management
Program processors
QNN inference
Quantization (signal)
RISC
RISC-V
SIMD
Title Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode
URI https://ieeexplore.ieee.org/document/10071725
https://www.proquest.com/docview/2819497537
Volume 70
WOSCitedRecordID wos000953712700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-0806
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0029031
  issn: 1549-8328
  databaseCode: RIE
  dateStart: 20040101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3BatwwEBVN6KE9pGmbkk2TMoeeCtpYli3ZvSWbLC0sYaFJm5uxpDFZMN7g9bbptV9ejeyEhZJCbjZohOGNpRmN5j3GPiqhvJcY5CKzKU-MVdyIquRpZJIcU2lTDJT5M31xkV1f5_OhWT30wiBiuHyGY3oMtXy3tGs6KjsWoWgcp1tsS2vVN2s9ZFd5JHty1CTn3k2zoYQpovz4cvLt65h0wsfS50MZdctubEJBVeWfpTjsL9NXT_yyXbYzBJJw0iP_mj3D5g17uUEv-Jb9OSOpruYznIBQfOLnXMG8bEk9pYarumtLPlv-4nMSSoNJvSbOBPix6G4gNrxbchkboBT1N0yJNtPUCKeLjs_bQZcHysbB93DsDzO_rnr7Wzi_QxucGUhmbY9dTc8vJ1_4oLnArZS686uyyfLMSlQuMVEltPYRDfowwuoyr9CVTqONUuezoNjZNMZKSW2VjlTlUVdOvmPbzbLBfQbCVqlU0hlbqcTPaJRGJQS9lBR6jFh0D0JhB0Jy0sWoi5CYRHlBuBWEWzHgNmKfHkxuezaO_w3eI6A2BvYYjdjhPdTF8MOuCqonJtRkrA8eMXvPXtDs_TWxQ7bdtWs8Ys_tz26xaj8EX_wLCTvadA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwELWgIAGH8lXE0hZ84ITkbRwndsKtLF21IqxWYgu9RbE9UVeKslU2y8eVX47HcauVEJW4JZLtRHpje8bjeY-Qt5JLZyUaGM9MyhJtJNO8rlga6SSHVJgUPGV-oWaz7OIin4didV8LAwD-8hmM8dHn8u3KbPCo7Ij7pHGc3iX3UDorlGvdxFd5JAZ61CRnzlCzkMTkUX60mHw5G6NS-Fi4iCjDetmtbcjrqvy1GPsdZvr4P__tCdkNriQ9HrB_Su5A-4w82iIYfE5-f0SxrvY9PaZcsokbc03nVYf6KQ09b_quYsXqB5ujVBqdNBtkTaDflv0ljTXrV0zEmmKQ-otOkThTN0A_LHs274IyD61aS7_6g39auJXV9b-iJz_BeHOmKLS2R86nJ4vJKQuqC8wIoXq3Lussz4wAaRMd1Vwp59OAcySMqvIabGUVmCi1Lg6KrUljqKVQRqpI1g53acULstOuWnhJKDd1KqSw2tQycSNqqUByji8VOh8jEl2DUJpASY7KGE3pQ5MoLxG3EnErA24j8u6my9XAx3Fb4z0EaqvhgNGIHFxDXYYpuy4xo5hgmbF69Y9ub8iD08XnoizOZp_2yUP80nBp7IDs9N0GDsl9871frrvX3i7_AJPb3b0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dustin%3A+A+16-Cores+Parallel+Ultra-Low-Power+Cluster+With+2b-to-32b+Fully+Flexible+Bit-Precision+and+Vector+Lockstep+Execution+Mode&rft.jtitle=IEEE+transactions+on+circuits+and+systems.+I%2C+Regular+papers&rft.au=Ottavi%2C+Gianmarco&rft.au=Garofalo%2C+Angelo&rft.au=Tagliavini%2C+Giuseppe&rft.au=Conti%2C+Francesco&rft.date=2023-06-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1549-8328&rft.eissn=1558-0806&rft.volume=70&rft.issue=6&rft.spage=2450&rft_id=info:doi/10.1109%2FTCSI.2023.3254810&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-8328&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-8328&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-8328&client=summon