Cascade: a Collaborative Algorithm for Scalable and Efficient Neighborhood Allgather
Neighborhood collectives are a critical feature of MPI, enabling efficient communication in applications with sparse communication patterns. This research proposes Cascade, a new algorithm for neighborhood allgather collective that organizes computing nodes along multiple paths based on their distan...
Saved in:
| Published in: | Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 13 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
02.09.2025
|
| Subjects: | |
| ISSN: | 2168-9253 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Neighborhood collectives are a critical feature of MPI, enabling efficient communication in applications with sparse communication patterns. This research proposes Cascade, a new algorithm for neighborhood allgather collective that organizes computing nodes along multiple paths based on their distance to the current node. In this approach, messages are forwarded along these paths and propagated until all outgoing neighbors receive them, reducing the communication time. Three performance models are developed to analyze the efficiency of the Cascade algorithm, the default Open MPI algorithm, and the recently proposed Distance-halving neighborhood algorithm in the literature, offering insight into communication cost, scalability, and expected behavior of the algorithms across different system configurations. Experimental results demonstrate that the Cascade algorithm achieves up to 9.54x and 7.05x speedup over Open MPI for random sparse graphs and Moore neighborhoods, respectively. Additionally, the algorithm improves performance by up to 5.25 x for a sparse matrix-matrix multiplication kernel. The Cascade algorithm outperforms the Distance-halving neighborhood algorithm by up to 2.57 x and 4.81 x speedup for random sparse graphs and Moore neighborhoods, respectively. Moreover, Cascade achieves up to 1.61x performance gain over the Distance-halving neighborhood for the sparse matrix-matrix multiplication kernel. The predictions of our performance models closely match the experimental results. |
|---|---|
| ISSN: | 2168-9253 |
| DOI: | 10.1109/CLUSTER59342.2025.11186497 |