Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode
Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architecture...
Saved in:
| Published in: | IEEE transactions on circuits and systems. I, Regular papers Vol. 70; no. 6; pp. 2450 - 2463 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1549-8328, 1558-0806 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W. |
|---|---|
| AbstractList | Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W. |
| Author | Garofalo, Angelo Mauro, Alfio Di Benini, Luca Rossi, Davide Tagliavini, Giuseppe Ottavi, Gianmarco Conti, Francesco |
| Author_xml | – sequence: 1 givenname: Gianmarco orcidid: 0000-0003-0041-7917 surname: Ottavi fullname: Ottavi, Gianmarco email: gianmarco.ottavi2@unibo.it organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy – sequence: 2 givenname: Angelo orcidid: 0000-0002-7495-6895 surname: Garofalo fullname: Garofalo, Angelo organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy – sequence: 3 givenname: Giuseppe orcidid: 0000-0002-9221-4633 surname: Tagliavini fullname: Tagliavini, Giuseppe organization: Department of Computer Science and Engineering (DISI), University of Bologna, Bologna, Italy – sequence: 4 givenname: Francesco orcidid: 0000-0002-7924-933X surname: Conti fullname: Conti, Francesco organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy – sequence: 5 givenname: Alfio Di orcidid: 0000-0001-6688-1603 surname: Mauro fullname: Mauro, Alfio Di organization: IIS Integrated Systems Laboratory, ETH Züric, Züric, Switzerland – sequence: 6 givenname: Luca orcidid: 0000-0001-8068-3806 surname: Benini fullname: Benini, Luca organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy – sequence: 7 givenname: Davide orcidid: 0000-0002-0651-5393 surname: Rossi fullname: Rossi, Davide organization: Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Bologna, Italy |
| BookMark | eNp9kE1P3DAQhq0KpC6UH1CpB0s9e-uPje30BoEtSFuxEtAeI8eZqKZuvNiOlr3yy0m0e0A9cJqR5n1mRs8JOupDDwh9ZnTOGC2_3Vd3N3NOuZgLXiw0ox_QjBWFJlRTeTT1i5JowfVHdJLSI6W8pILN0MvlkLLrv-NzzCSpQoSE1yYa78HjB5-jIauwJeuwhYgrP4bH-tvlP5g3JAcieIOXg_c7vPTw7BoP-MJlso5gXXKhx6Zv8S-wOUS8CvbvyG_w1TPYIU_Tn6GFT-i4Mz7B2aGeoofl1X11TVa3P26q8xWxQqhMGG10qa0A2S4a2jGlhC5BaW6VKTtoTavA0qKVtOCtLTh0UigrFZUdk1y24hR93e_dxPA0QMr1YxhiP56suWblolSFUGNK7VM2hpQidLV12UzPji6crxmtJ-H1JLyehNcH4SPJ_iM30f0zcfcu82XPOAB4k6eKKV6IVxK-jQ4 |
| CODEN | ITCSCH |
| CitedBy_id | crossref_primary_10_1145_3768630 crossref_primary_10_1109_ACCESS_2024_3380472 crossref_primary_10_1109_ACCESS_2024_3401831 crossref_primary_10_1109_TVLSI_2024_3466224 crossref_primary_10_1016_j_vlsi_2024_102282 crossref_primary_10_1145_3729215 crossref_primary_10_1109_ACCESS_2025_3582013 crossref_primary_10_3390_jlpea13010005 |
| Cites_doi | 10.1145/3387902.3394038 10.7873/DATE.2013.090 10.1109/JSSC.2019.2912307 10.1109/JSSC.2021.3056219 10.1109/CVPR.2019.00881 10.1109/CVPR42600.2020.00242 10.1109/DATE.2012.6176639 10.1109/TETC.2021.3072337 10.1109/TCSII.2020.2983648 10.1109/ISSCC19947.2020.9062989 10.1098/rsta.2019.0155 10.23919/DATE51398.2021.9474087 10.1109/JSSC.2021.3114881 10.1109/DAC.2018.8465915 10.23919/VLSIC.2017.8008534 10.1109/TPDS.2020.3028691 10.1109/ICCV.2019.00038 10.1109/MWSCAS.2010.5548579 10.1109/JETCAS.2019.2910232 10.1109/ISSCC.2018.8310262 10.1109/CVPR.2018.00474 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/TCSI.2023.3254810 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-0806 |
| EndPage | 2463 |
| ExternalDocumentID | 10_1109_TCSI_2023_3254810 10071725 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: European Commission Horizon 2020 Framework through The European Pilot project grantid: 101034126 funderid: 10.13039/501100000780 – fundername: WiPLASH project grantid: 863337 – fundername: CSEL Joint Undertaking Horizon 2020 through the AI4DI grantid: 826060 – fundername: GreenWaves Technologies |
| GroupedDBID | 0R~ 29I 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK AETIX AGQYO AGSQL AHBIQ AIBXA AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ H~9 IFIPE IPLJI JAVBF M43 O9- OCL PZZ RIA RIE RNS VJK AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c337t-10b898c3e6d4b0f177389e782c7a9fedad7ec05d6052dc52ef637c6706f1626d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000953712700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1549-8328 |
| IngestDate | Mon Jun 30 08:33:21 EDT 2025 Sat Nov 29 06:23:57 EST 2025 Tue Nov 18 19:50:36 EST 2025 Wed Aug 27 02:18:06 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c337t-10b898c3e6d4b0f177389e782c7a9fedad7ec05d6052dc52ef637c6706f1626d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-0041-7917 0000-0001-6688-1603 0000-0001-8068-3806 0000-0002-9221-4633 0000-0002-7924-933X 0000-0002-0651-5393 0000-0002-7495-6895 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10071725 |
| PQID | 2819497537 |
| PQPubID | 85411 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_10071725 crossref_citationtrail_10_1109_TCSI_2023_3254810 crossref_primary_10_1109_TCSI_2023_3254810 proquest_journals_2819497537 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-06-01 |
| PublicationDateYYYYMMDD | 2023-06-01 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on circuits and systems. I, Regular papers |
| PublicationTitleAbbrev | TCSI |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref12 ref34 ref15 choi (ref18) 2018 ref31 ref11 miro-panades (ref33) 2020 naumov (ref14) 2018 ref2 lai (ref6) 2018 banbury (ref30) 2021 van baalen (ref1) 2020 moons (ref19) 2017 wu (ref13) 2018 howard (ref10) 2017 ref24 ref26 ref25 ref20 ref22 yao (ref17) 2021 ref21 ref28 ref27 (ref7) 2022 dong (ref16) 2020; 33 ref29 ref8 bol (ref32) 2021; 56 ref9 ref4 ref3 ref5 (ref23) 2020 |
| References_xml | – volume: 33 start-page: 18518 year: 2020 ident: ref16 article-title: HAWQ-V2: Hessian aware trace-weighted quantization of neural networks publication-title: Proc Adv Neural Inf Process Syst – year: 2018 ident: ref18 article-title: Bridging the accuracy gap for 2-bit quantized neural networks (QNN) publication-title: arXiv 1807 06964 – ident: ref9 doi: 10.1145/3387902.3394038 – year: 2021 ident: ref30 article-title: MLPerf tiny benchmark publication-title: arXiv 2106 07597 – ident: ref25 doi: 10.7873/DATE.2013.090 – year: 2022 ident: ref7 publication-title: ARM Helium – ident: ref34 doi: 10.1109/JSSC.2019.2912307 – volume: 56 start-page: 2256 year: 2021 ident: ref32 article-title: SleepRunner: A 28-nm FDSOI ULP cortex-M0 MCU with ULL SRAM and UFBR PVT compensation for 2.6-3.6-? W/DMIPS 40-80-MHz active mode and 131-nW/kB fully retentive deep-sleep mode publication-title: IEEE J Solid-State Circuits doi: 10.1109/JSSC.2021.3056219 – ident: ref12 doi: 10.1109/CVPR.2019.00881 – start-page: 1 year: 2020 ident: ref33 article-title: SamurAI: A 1.7MOPS-36GOPS adaptive versatile IoT node with 15,000? peak-to-idle power reduction, 207ns wake-up time and 1.3TOPS/W ML efficiency publication-title: Proc IEEE Symp VLSI Circuits – ident: ref2 doi: 10.1109/CVPR42600.2020.00242 – ident: ref27 doi: 10.1109/DATE.2012.6176639 – year: 2018 ident: ref14 article-title: On periodic functions as regularizers for quantization of neural networks publication-title: arXiv 1811 09862 – start-page: 246 year: 2017 ident: ref19 article-title: 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm FDSOI publication-title: IEEE Int Solid-State Circuits Conf (ISSCC) Dig Tech Papers – year: 2018 ident: ref6 article-title: CMSIS-NN: Efficient neural network kernels for arm Cortex-M CPUs publication-title: arXiv 1801 06601 [cs] – start-page: 11875 year: 2021 ident: ref17 article-title: HAWQ-V3: Dyadic neural network quantization publication-title: Proc Int Conf Mach Learn – ident: ref8 doi: 10.1109/TETC.2021.3072337 – ident: ref31 doi: 10.1109/TCSII.2020.2983648 – ident: ref3 doi: 10.1109/ISSCC19947.2020.9062989 – ident: ref5 doi: 10.1098/rsta.2019.0155 – ident: ref26 doi: 10.23919/DATE51398.2021.9474087 – ident: ref24 doi: 10.1109/JSSC.2021.3114881 – ident: ref21 doi: 10.1109/DAC.2018.8465915 – ident: ref20 doi: 10.23919/VLSIC.2017.8008534 – ident: ref29 doi: 10.1109/TPDS.2020.3028691 – year: 2017 ident: ref10 article-title: MobileNets: Efficient convolutional neural networks for mobile vision applications publication-title: arXiv 1704 04861 – year: 2020 ident: ref1 article-title: Bayesian bits: Unifying quantization and pruning publication-title: arXiv 2005 07093 – ident: ref15 doi: 10.1109/ICCV.2019.00038 – year: 2020 ident: ref23 publication-title: Lattice sensAI Delivers 10X Performance Boost for Low-Power Smart IoT Devices at the Edge – ident: ref28 doi: 10.1109/MWSCAS.2010.5548579 – ident: ref22 doi: 10.1109/JETCAS.2019.2910232 – ident: ref4 doi: 10.1109/ISSCC.2018.8310262 – year: 2018 ident: ref13 article-title: Mixed precision quantization of ConvNets via differentiable neural architecture search publication-title: arXiv 1812 00090 – ident: ref11 doi: 10.1109/CVPR.2018.00474 |
| SSID | ssj0029031 |
| Score | 2.5129993 |
| Snippet | Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2450 |
| SubjectTerms | Algorithms Arithmetic Artificial neural networks Clusters Computer architecture Electronic devices Energy efficiency Field programmable gate arrays Hardware Microprocessors MIMD mixed-precision Power consumption Power management Program processors QNN inference Quantization (signal) RISC RISC-V SIMD |
| Title | Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode |
| URI | https://ieeexplore.ieee.org/document/10071725 https://www.proquest.com/docview/2819497537 |
| Volume | 70 |
| WOSCitedRecordID | wos000953712700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-0806 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0029031 issn: 1549-8328 databaseCode: RIE dateStart: 20040101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3BatwwEBVN6KE9pGmbkk2TMoeeCtpYli3ZvSWbLC0sYaFJm5uxpDFZMN7g9bbptV9ejeyEhZJCbjZohOGNpRmN5j3GPiqhvJcY5CKzKU-MVdyIquRpZJIcU2lTDJT5M31xkV1f5_OhWT30wiBiuHyGY3oMtXy3tGs6KjsWoWgcp1tsS2vVN2s9ZFd5JHty1CTn3k2zoYQpovz4cvLt65h0wsfS50MZdctubEJBVeWfpTjsL9NXT_yyXbYzBJJw0iP_mj3D5g17uUEv-Jb9OSOpruYznIBQfOLnXMG8bEk9pYarumtLPlv-4nMSSoNJvSbOBPix6G4gNrxbchkboBT1N0yJNtPUCKeLjs_bQZcHysbB93DsDzO_rnr7Wzi_QxucGUhmbY9dTc8vJ1_4oLnArZS686uyyfLMSlQuMVEltPYRDfowwuoyr9CVTqONUuezoNjZNMZKSW2VjlTlUVdOvmPbzbLBfQbCVqlU0hlbqcTPaJRGJQS9lBR6jFh0D0JhB0Jy0sWoi5CYRHlBuBWEWzHgNmKfHkxuezaO_w3eI6A2BvYYjdjhPdTF8MOuCqonJtRkrA8eMXvPXtDs_TWxQ7bdtWs8Ys_tz26xaj8EX_wLCTvadA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwELWgIAGH8lXE0hZ84ITkbRwndsKtLF21IqxWYgu9RbE9UVeKslU2y8eVX47HcauVEJW4JZLtRHpje8bjeY-Qt5JLZyUaGM9MyhJtJNO8rlga6SSHVJgUPGV-oWaz7OIin4didV8LAwD-8hmM8dHn8u3KbPCo7Ij7pHGc3iX3UDorlGvdxFd5JAZ61CRnzlCzkMTkUX60mHw5G6NS-Fi4iCjDetmtbcjrqvy1GPsdZvr4P__tCdkNriQ9HrB_Su5A-4w82iIYfE5-f0SxrvY9PaZcsokbc03nVYf6KQ09b_quYsXqB5ujVBqdNBtkTaDflv0ljTXrV0zEmmKQ-otOkThTN0A_LHs274IyD61aS7_6g39auJXV9b-iJz_BeHOmKLS2R86nJ4vJKQuqC8wIoXq3Lussz4wAaRMd1Vwp59OAcySMqvIabGUVmCi1Lg6KrUljqKVQRqpI1g53acULstOuWnhJKDd1KqSw2tQycSNqqUByji8VOh8jEl2DUJpASY7KGE3pQ5MoLxG3EnErA24j8u6my9XAx3Fb4z0EaqvhgNGIHFxDXYYpuy4xo5hgmbF69Y9ub8iD08XnoizOZp_2yUP80nBp7IDs9N0GDsl9871frrvX3i7_AJPb3b0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dustin%3A+A+16-Cores+Parallel+Ultra-Low-Power+Cluster+With+2b-to-32b+Fully+Flexible+Bit-Precision+and+Vector+Lockstep+Execution+Mode&rft.jtitle=IEEE+transactions+on+circuits+and+systems.+I%2C+Regular+papers&rft.au=Ottavi%2C+Gianmarco&rft.au=Garofalo%2C+Angelo&rft.au=Tagliavini%2C+Giuseppe&rft.au=Conti%2C+Francesco&rft.date=2023-06-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1549-8328&rft.eissn=1558-0806&rft.volume=70&rft.issue=6&rft.spage=2450&rft_id=info:doi/10.1109%2FTCSI.2023.3254810&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-8328&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-8328&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-8328&client=summon |