Simultaneous Input and Output Matrix Partitioning for Outer-Product--Parallel Sparse Matrix-Matrix Multiplication

For outer-product--parallel sparse matrix-matrix multiplication (SpGEMM) of the form $C\!=\!A\!\times\!B$, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SIAM journal on scientific computing Ročník 36; číslo 5; s. C568 - C590
Hlavní autoři:	Akbudak, Kadir, Aykanat, Cevdet
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	01.01.2014
Témata:	Algorithms Computation Libraries Mathematical models Multiplication Partitioning Replication Two dimensional
ISSN:	1064-8275, 1095-7197
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	For outer-product--parallel sparse matrix-matrix multiplication (SpGEMM) of the form $C\!=\!A\!\times\!B$, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices $A$ and $B$, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1064-8275 1095-7197
DOI:	10.1137/13092589X