DDC-PIM: Efficient Algorithm/Architecture Co-Design for Doubling Data Capacity of SRAM-Based Processing-in-Memory

Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration den...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on computer-aided design of integrated circuits and systems Ročník 43; číslo 3; s. 906 - 918
Hlavní autoři:	Duan, Cenlin, Yang, Jianlei, He, Xiaolin, Qi, Yingjie, Wang, Yikun, Wang, Yiou, He, Ziyan, Yan, Bonan, Wang, Xueyan, Jia, Xiaotao, Pan, Weitao, Zhao, Weisheng
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.03.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithm/architecture co-design Algorithms Artificial neural networks Co-design Computational modeling Computer architecture Density doubling data capacity In-memory computing Memory architecture Neural networks Parallel processing processing-in-memory (PIM) SRAM cells SRAM-PIM Static random access memory
ISSN:	0278-0070, 1937-4151
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other nonvolatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states <inline-formula> <tex-math notation="LaTeX">(Q/\overline {Q}) </tex-math></inline-formula>, thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about <inline-formula> <tex-math notation="LaTeX">2.84\times </tex-math></inline-formula> speedup on MobileNetV2 and <inline-formula> <tex-math notation="LaTeX">2.69\times </tex-math></inline-formula> on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to <inline-formula> <tex-math notation="LaTeX">8.41\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.75\times </tex-math></inline-formula> improvement in weight density and area efficiency, respectively.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2023.3330819