Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific fields. Theory and implementation for the dense, square matrix case are well-developed. If matrices are sparse, with application-specific sparsity...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org
Hauptverfasser: Lazzaro, Alfio, VandeVondele, Joost, Hutter, Juerg, Schuett, Ole
Format: Paper
Sprache:Englisch
Veröffentlicht: Ithaca Cornell University Library, arXiv.org 29.05.2017
Schlagworte:
ISSN:2331-8422
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific fields. Theory and implementation for the dense, square matrix case are well-developed. If matrices are sparse, with application-specific sparsity patterns, the optimal implementation remains an open question. Here, we explore the performance of communication reducing 2.5D algorithms and one-sided MPI communication in the context of linear scaling electronic structure theory. In particular, we extend the DBCSR sparse matrix library, which is the basic building block for linear scaling electronic structure theory and low scaling correlated methods in CP2K. The library is specifically designed to efficiently perform block-sparse matrix-matrix multiplication of matrices with a relatively large occupation. Here, we compare the performance of the original implementation based on Cannon's algorithm and MPI point-to-point communication, with an implementation based on MPI one-sided communications (RMA), in both a 2D and a 2.5D approach. The 2.5D approach trades memory and auxiliary operations for reduced communication, which can lead to a speedup if communication is dominant. The 2.5D algorithm is somewhat easier to implement with one-sided communications. A detailed description of the implementation is provided, also for non ideal processor topologies, since this is important for actual applications. Given the importance of the precise sparsity pattern, and even the actual matrix data, which decides the effective fill-in upon multiplication, the tests are performed within the CP2K package with application benchmarks. Results show a substantial boost in performance for the RMA based 2.5D algorithm, up to 1.80x, which is observed to increase with the number of involved processes in the parallelization.
AbstractList Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific fields. Theory and implementation for the dense, square matrix case are well-developed. If matrices are sparse, with application-specific sparsity patterns, the optimal implementation remains an open question. Here, we explore the performance of communication reducing 2.5D algorithms and one-sided MPI communication in the context of linear scaling electronic structure theory. In particular, we extend the DBCSR sparse matrix library, which is the basic building block for linear scaling electronic structure theory and low scaling correlated methods in CP2K. The library is specifically designed to efficiently perform block-sparse matrix-matrix multiplication of matrices with a relatively large occupation. Here, we compare the performance of the original implementation based on Cannon's algorithm and MPI point-to-point communication, with an implementation based on MPI one-sided communications (RMA), in both a 2D and a 2.5D approach. The 2.5D approach trades memory and auxiliary operations for reduced communication, which can lead to a speedup if communication is dominant. The 2.5D algorithm is somewhat easier to implement with one-sided communications. A detailed description of the implementation is provided, also for non ideal processor topologies, since this is important for actual applications. Given the importance of the precise sparsity pattern, and even the actual matrix data, which decides the effective fill-in upon multiplication, the tests are performed within the CP2K package with application benchmarks. Results show a substantial boost in performance for the RMA based 2.5D algorithm, up to 1.80x, which is observed to increase with the number of involved processes in the parallelization.
Author VandeVondele, Joost
Schuett, Ole
Lazzaro, Alfio
Hutter, Juerg
Author_xml – sequence: 1
  givenname: Alfio
  surname: Lazzaro
  fullname: Lazzaro, Alfio
– sequence: 2
  givenname: Joost
  surname: VandeVondele
  fullname: VandeVondele, Joost
– sequence: 3
  givenname: Juerg
  surname: Hutter
  fullname: Hutter, Juerg
– sequence: 4
  givenname: Ole
  surname: Schuett
  fullname: Schuett, Ole
BookMark eNotjU9PwjAcQBujiYh8AG-_xPNm--u6liNBRBIIJnAnXf9Ayeyw2xS_vSR4enmX9x7IbWyiI-SJ0bxQQtAXnc7hO2eSipxRZOqGDJBzlqkC8Z6M2vZIKcVSohB8QA6LaJLTbYh76A4OZt4HE1w0v9B42Jx0ah2sdJfCObsCVn3dhVMdjO5CE-EndAfQgLl4hUm9b9LFP0FHC-vosk2wzsLqY_FI7ryuWzf655Bs32bb6Xu2XM8X08ky0wJ5psaVYUxKw6wRnlLLvaicE5KPvXAMjeLaV6W3ymCpcVx4o1Aqap2y3FSeD8nzNXtKzVfv2m53bPoUL8cdUikkVUXJ-R-Gwlpm
ContentType Paper
Copyright 2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.48550/arxiv.1705.10218
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-LOGICAL-a523-89bc1177c1dc5f00d3f5bee5739f5e12c83afb6fd8c26a294fc82780de8d3cbf3
IEDL.DBID PIMPY
IngestDate Mon Jun 30 09:35:40 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a523-89bc1177c1dc5f00d3f5bee5739f5e12c83afb6fd8c26a294fc82780de8d3cbf3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://www.proquest.com/publiccontent/docview/2075708463?pq-origsite=%requestingapplication%
PQID 2075708463
PQPubID 2050157
ParticipantIDs proquest_journals_2075708463
PublicationCentury 2000
PublicationDate 20170529
PublicationDateYYYYMMDD 2017-05-29
PublicationDate_xml – month: 05
  year: 2017
  text: 20170529
  day: 29
PublicationDecade 2010
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2017
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.6241292
SecondaryResourceType preprint
Snippet Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Algorithms
Communication
Electronic structure
Linear algebra
Mathematical analysis
Matrix methods
Microprocessors
Multiplication
Parallel processing
Scaling
Sparsity
Topology
Title Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI
URI https://www.proquest.com/docview/2075708463
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLWgBYmJt3iUygOr28SJ85gQj1Z0aIlohzJVflzTSpCWpFT9fOwkhQGJiSmDh1iWfY7vvcfnInRtWByAUzDoF5gARXiaCCklCTh3RQxCxIwXzSbCwSAaj-Okeh6dV7LKDSYWQF26PVvdtgHhtppLmzG3eREWOoY7vZvFB7E9pGyttWqosY3q1njLqaF60usnL985FxqE5gbtlcXNwsqrzbP1bNWynjLWxMCNfkFywTPd_f-d4YGZGV9Adoi2ID1Cu4XaU-bHaGpQwYrRDW1hcwHEncJHwj7CxHONhwsT7ALuW_P-NSk_uF8KD6sMH7bpW8wxbbEHfPv2an68nL5jnir8lAIZzhQo3E96J2jU7YzuH0nVcoFwE5GSKBbSVnGlqyTTjqM8zQQAC71YM3CpjDyuRaBVJGnAaexrGdEwchREypNCe6eols5TOEPYALp0XKmZ8n2faRWDgRPh8yKA4sDPUWOzipPq2OSTn0W7-Hv4Eu1Ry68OIzRuoNoy-4QrtCNXy1meNVH9rjNInptWyDlsVrvgC5fRw-w
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LT9tAEB5BAoIT5SWgtOyhPS7Ya68fB1ShQkREnEZKDnCK9jELkVonJOH1o_ofO-sk5YDUG4eefLBkaTyz3zx3PoAv5MURlUBCv4QSFB05ro0xPFEq1DlqnUtVkU2k7XZ2fZ13luD34i6MH6tcYGIF1HZofI3cV0JkGpC3jL6N7rlnjfLd1QWFxswsrvDliVK2yWnznPT7VYjGRe_7JZ-zCnBFSRfPcm18o9KE1kgXBDZyUiPKNMqdxFCYLFJOJ85mRiRK5LEzmUizwGJmI6NdRJ9dhnpMth7UoN5pFp2bv0UdkaQUokez7mm1K-xEjZ8Hj8d-aY3fkhBmbzC_cmSNjf_sF3wg0dUIx5uwhOUWrFbzqmayDXeEa36cnhwvoxCWXVSbMPw1UjZ0rDuidB1Z4ekHnvnswYrZ6OS8Rsl8AZopJo7lOTv7eUtyTu9-MVVa9qNE3h1YtKzoNHeg9x6y7UKtHJa4B4xckglC46SN41g6myMBoo5VlQIqVPtwuFBTf37wJ_1XHR38-_URrF32ila_1WxffYR14aOFQHKRH0JtOn7AT7BiHqeDyfjz3MgY9N9Zp38ApK8UJg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Increasing+the+Efficiency+of+Sparse+Matrix-Matrix+Multiplication+with+a+2.5D+Algorithm+and+One-Sided+MPI&rft.jtitle=arXiv.org&rft.au=Lazzaro%2C+Alfio&rft.au=VandeVondele%2C+Joost&rft.au=Hutter%2C+Juerg&rft.au=Schuett%2C+Ole&rft.date=2017-05-29&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.1705.10218