PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices

Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel ap...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) S. 245 - 260
Hauptverfasser: Noh, Si Ung, Hong, Junguk, Lim, Chaemin, Park, Seongyeon, Kim, Jeehyun, Kim, Hanjun, Kim, Youngsok, Lee, Jinho
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 29.06.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent dual in-line memory modules (DIMMs) are starting to support processing-in-memory (PIM) by associating their memory banks with processing elements (PEs), allowing applications to overcome the data movement bottleneck by offloading memory-intensive operations to the PEs. Many highly parallel applications have been shown to benefit from these PIM-enabled DIMMs, but further speedup is often limited by the huge overhead of inter-PE collective communication. This mainly comes from the slow CPU-mediated inter-PE communication methods, making it difficult for PIM-enabled DIMMs to accelerate a wider range of applications. Prior studies have tried to alleviate the communication bottleneck, but they lack enough flexibility and performance to be used for a wide range of applications. In this paper, we present PID-Comm, a fast and flexible inter-PE collective communication framework for commodity PIM-enabled DIMMs. The key idea of PID-Comm is to abstract the PEs as a multi-dimensional hypercube and allow multiple instances of inter-PE collective communication between the PEs belonging to certain dimensions of the hypercube. Leveraging this abstraction, PID-Comm first defines eight interPE collective communication patterns that allow applications to easily express their complex communication patterns. Then, PIDComm provides high-performance implementations of the interPE collective communication patterns optimized for the DIMMs. Our evaluation using 16 UPMEM DIMMs and representative parallel algorithms shows that PID-Comm greatly improves the performance by up to 5.19 \times compared to the existing inter-PE communication implementations. The implementation of PIDComm is available at https://github.com/AIS-SNU/PID-Comm.
DOI:10.1109/ISCA59077.2024.00027