SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM

Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command. However, this synchronous approach requires the activation of all banks before computation...

Full description

Saved in:

Bibliographic Details
Published in:	2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors:	Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon
Format:	Conference Proceeding
Language:	English
Published:	IEEE 22.06.2025
Subjects:	Design automation Limiting Memory architecture Performance gain Periodic structures Random access memory Reservoirs Single instruction multiple data Synchronization Throughput
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command. However, this synchronous approach requires the activation of all banks before computation, leading to activation times that exceed computation times, limiting performance gain. Recently, asynchronous execution PIM has been proposed as an alternative, allowing banks to operate asynchronously and overlap activation with processing to hide the row activation overhead. While effective at reducing row activation overhead, the independent operation requires large shared accumulators for each bank group, increasing area overhead. To address the issues, we propose bank group (BG)-level split synchronization DRAM PIM, where each bank group operates asynchronously to hide row activation overhead while operating synchronously within the bank group to eliminate the need for shared accumulators. Evaluation results show that our proposed design achieves an average throughput improvement of 1.70 x and 1.06 x compared to conventional PIM and asynchronous execution PIM. Furthermore, the area overhead per processing unit (PU) increases by only 1.5 \% compared to conventional PIM and is significantly lower than that of asynchronous execution PIM.
DOI:	10.1109/DAC63849.2025.11132821