Cascade: a Collaborative Algorithm for Scalable and Efficient Neighborhood Allgather

Neighborhood collectives are a critical feature of MPI, enabling efficient communication in applications with sparse communication patterns. This research proposes Cascade, a new algorithm for neighborhood allgather collective that organizes computing nodes along multiple paths based on their distan...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 13
Main Authors: Sharifian, Hamed, Sojoodi, Amirhossein, Afsahi, Ahmad
Format: Conference Proceeding
Language:English
Published: IEEE 02.09.2025
Subjects:
ISSN:2168-9253
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Neighborhood collectives are a critical feature of MPI, enabling efficient communication in applications with sparse communication patterns. This research proposes Cascade, a new algorithm for neighborhood allgather collective that organizes computing nodes along multiple paths based on their distance to the current node. In this approach, messages are forwarded along these paths and propagated until all outgoing neighbors receive them, reducing the communication time. Three performance models are developed to analyze the efficiency of the Cascade algorithm, the default Open MPI algorithm, and the recently proposed Distance-halving neighborhood algorithm in the literature, offering insight into communication cost, scalability, and expected behavior of the algorithms across different system configurations. Experimental results demonstrate that the Cascade algorithm achieves up to 9.54x and 7.05x speedup over Open MPI for random sparse graphs and Moore neighborhoods, respectively. Additionally, the algorithm improves performance by up to 5.25 x for a sparse matrix-matrix multiplication kernel. The Cascade algorithm outperforms the Distance-halving neighborhood algorithm by up to 2.57 x and 4.81 x speedup for random sparse graphs and Moore neighborhoods, respectively. Moreover, Cascade achieves up to 1.61x performance gain over the Distance-halving neighborhood for the sparse matrix-matrix multiplication kernel. The predictions of our performance models closely match the experimental results.
ISSN:2168-9253
DOI:10.1109/CLUSTER59342.2025.11186497