SplitSync: Bank Group-Level Split-Synchronization for High-Performance DRAM PIM

Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command. However, this synchronous approach requires the activation of all banks before computation...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři:	Yoon, Byungkuk, Han, Sanghyeok, Park, Gyeonghwan, Kim, Jae-Joon
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 22.06.2025
Témata:	Design automation Limiting Memory architecture Performance gain Periodic structures Random access memory Reservoirs Single instruction multiple data Synchronization Throughput
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Processing in Memory (PIM) architectures enhance memory bandwidth by utilizing bank-level parallelism, typically implemented with a SIMD structure where all banks operate simultaneously under a single command. However, this synchronous approach requires the activation of all banks before computation, leading to activation times that exceed computation times, limiting performance gain. Recently, asynchronous execution PIM has been proposed as an alternative, allowing banks to operate asynchronously and overlap activation with processing to hide the row activation overhead. While effective at reducing row activation overhead, the independent operation requires large shared accumulators for each bank group, increasing area overhead. To address the issues, we propose bank group (BG)-level split synchronization DRAM PIM, where each bank group operates asynchronously to hide row activation overhead while operating synchronously within the bank group to eliminate the need for shared accumulators. Evaluation results show that our proposed design achieves an average throughput improvement of 1.70 x and 1.06 x compared to conventional PIM and asynchronous execution PIM. Furthermore, the area overhead per processing unit (PU) increases by only 1.5 \% compared to conventional PIM and is significantly lower than that of asynchronous execution PIM.
DOI:	10.1109/DAC63849.2025.11132821