Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architecture...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on circuits and systems. I, Regular papers Ročník 70; číslo 6; s. 2450 - 2463
Hlavní autoři:	Ottavi, Gianmarco, Garofalo, Angelo, Tagliavini, Giuseppe, Conti, Francesco, Mauro, Alfio Di, Benini, Luca, Rossi, Davide
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.06.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Arithmetic Artificial neural networks Clusters Computer architecture Electronic devices Energy efficiency Field programmable gate arrays Hardware Microprocessors MIMD mixed-precision Power consumption Power management Program processors QNN inference Quantization (signal) RISC RISC-V SIMD
ISSN:	1549-8328, 1558-0806
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel algorithms on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, and energy efficiency. Low-bitwidth and mixed-precision arithmetic have been proven to be valid strategies for tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- to 32-bit arithmetic and all possible mixed-precision combinations. In addition to a conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces a Vector Lockstep Execution Mode (VLEM) to minimize power consumption in highly data-parallel kernels. In VLEM, a single leader core fetches instructions and broadcasts them to the 15 follower cores. Clock gating Instruction Fetch (IF) stages and private caches of the follower cores leads to 38% power reduction. The cluster, implemented in 65 nm CMOS technology, achieves a peak performance of 58 GOPS and a peak efficiency of 1.15 TOPS/W.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2023.3254810