Dedicated Instruction Set for Pattern-based Data Transfers: an Experimental Validation on Systems Containing In-Memory Computing Units

In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architectu...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on computer-aided design of integrated circuits and systems Ročník 42; číslo 11; s. 1
Hlavní autoři: Mambu, Kevin, Charles, Henri-Pierre, Kooli, Maha
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0278-0070, 1937-4151
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In-Memory Computing (IMC) aims at solving the performance gap between CPU and memories introduced by the memory wall. However, general-purpose IMC does not consider the optimization of data transfers for patterns such as stencils and convolutions. This paper proposes a new Instruction Set Architecture (ISA) and a novel pattern encoding for IMC to transfer and organize data streams in order to perform efficiently computation. This instruction set is implemented on the Data-locality Management Unit (DMU) as a subset of the Computational SRAM (C-SRAM) Instruction Set Architecture. A programming model to interact with the DMU at languagelevel is also presented in this paper. This DMU ISA is evaluated on six applications run on three different system nodes. These system nodes are based on existing RISC-V cores and range from embedded to high-performance computing domain. Experiments show on average a speed-up of W8.81, an energy reduction factor of W6.81 and an improvement of the number of operations per cycle of W4.59, for The C-SRAM architecture integrating the proposed ISA of the DMU compared to a reference implementation on embedded systems. Results also show an improvement of the number of operations per cycle of W2.99 compared to a reference implementation on all system nodes.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2023.3258346