One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of curren...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Procedia computer science Ročník 9; s. 37 - 46
Hlavní autori: Yamazaki, Ichitaro, Tomov, Stanimire, Dongarra, Jack
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 2012
Predmet:
ISSN:1877-0509, 1877-0509
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs.
AbstractList One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs.
Author Yamazaki, Ichitaro
Dongarra, Jack
Tomov, Stanimire
Author_xml – sequence: 1
  givenname: Ichitaro
  surname: Yamazaki
  fullname: Yamazaki, Ichitaro
  email: iyamazak@cs.utk.edu
– sequence: 2
  givenname: Stanimire
  surname: Tomov
  fullname: Tomov, Stanimire
  email: tomov@cs.utk.edu
– sequence: 3
  givenname: Jack
  surname: Dongarra
  fullname: Dongarra, Jack
  email: dongarra@cs.utk.edu
BookMark eNqFkMtOwzAQRS1UJErpF7DxDyTYcZzECxZVoQWpVZGga8uxJ8JViCvbvPr1pC0LxAJmM3MXZ6R7ztGgcx0gdElJSgktrjbp1jsd0ozQLCV5Sgg_QUNalWVCOBGDH_cZGoewIf2wqhK0HKLHVQdJsAYMvoEuAF6q6O0Hnikdnbc7Fa3rAnYdVnj52karnQf8buPzMW5bwPOHNZ5oDS141UPhAp02qg0w_t4jtJ7dPk3vksVqfj-dLBLN8iomRWUIMJGDYIbypqRGCC4aXjNOKVWGZyWtS8HygmtF86xmNam5KowmgtV5xUaIHf9q70Lw0Mitty_Kf0pK5F6N3MiDGrlXI0kuezU9JX5R2sZDzeiVbf9hr48s9LXeLHgZtIVOg7EedJTG2T_5LxC8gnU
CitedBy_id crossref_primary_10_1002_cpe_3152
crossref_primary_10_1002_cpe_4012
crossref_primary_10_1002_cpe_5754
crossref_primary_10_1002_cpe_4504
crossref_primary_10_1016_j_cam_2014_02_011
crossref_primary_10_1109_TPDS_2018_2842785
Cites_doi 10.1145/355841.355847
10.1109/IPDPSW.2010.5470941
10.1109/AICCSA.2011.6126599
10.1177/1094342010385729
10.1137/1.9781611971811
10.1109/SAAHPC.2011.18
10.1007/978-3-540-85451-7_79
10.1137/1.9780898719604
ContentType Journal Article
Copyright 2012
Copyright_xml – notice: 2012
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.procs.2012.04.005
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 46
ExternalDocumentID 10_1016_j_procs_2012_04_005
S1877050912001263
GroupedDBID --K
0R~
0SF
1B1
457
5VS
6I.
71M
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
ABMAC
ACGFS
ADBBV
ADEZE
AEXQZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
NCXOZ
O-L
O9-
OK1
P2P
RIG
ROL
SES
SSZ
9DU
AAYWO
AAYXX
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEUPX
AFPUW
AIGII
AKBMS
AKRWK
AKYEP
CITATION
~HD
ID FETCH-LOGICAL-c348t-68d0e394e93d15f71d9959f5b35111ad5271b793465ca142b3b0b5a6dc093b483
ISICitedReferencesCount 15
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000306288400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1877-0509
IngestDate Sat Nov 29 02:44:17 EST 2025
Tue Nov 18 21:47:21 EST 2025
Wed May 17 00:09:02 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords one-sided factorization
GPU accelerators
Dense linear algebra
Language English
License http://creativecommons.org/licenses/by-nc-nd/3.0
https://www.elsevier.com/tdm/userlicense/1.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-68d0e394e93d15f71d9959f5b35111ad5271b793465ca142b3b0b5a6dc093b483
OpenAccessLink https://dx.doi.org/10.1016/j.procs.2012.04.005
PageCount 10
ParticipantIDs crossref_primary_10_1016_j_procs_2012_04_005
crossref_citationtrail_10_1016_j_procs_2012_04_005
elsevier_sciencedirect_doi_10_1016_j_procs_2012_04_005
PublicationCentury 2000
PublicationDate 2012
2012-00-00
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – year: 2012
  text: 2012
PublicationDecade 2010
PublicationTitle Procedia computer science
PublicationYear 2012
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References M. Horton, S. Tomov, J. Dongarra, A class of hybrid lapack algorithms for multicore and gpu architectures, in: Proceedings of Symposium for Application Accelerators in High Performance Computing (SAAHPC), 2011.
J. Dongarra, M. Faverge, H. Ltaief, P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization, Tech. rep., Innovative Computing Laboratory, University of Tennessee (2011).
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK Users’ guide, 3rd Edition, Society for Industrial and Applied Mathematics, 1999.
S. Tomov, R. Nath, P. Du, J. Dongarra, MAGMA version Users’ guide, available at http://icl. eecs.utk. edu/magma/(2009).
S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, E.S. Quintana-Orti, Solving dense linear systems on graphics processors, in: Euro-Par 2008. Parallel Processing, Vol. 5168 of Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 2008, pp. 739-748.
J. Dongarra, J. Bunch, C. Moler, G. Stewart, LINPACK Users’ Guide, Society for Industrial and Applied Mathematics, 1979.
S. Tomov, R. Nath, H. Ltaief, J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in: Proceedings of IEEE. International Parallel and Distributed Processing Symposium (IPDPS), 2010.
J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, S. Yalamanchili, Keeneland: Bringing heterogeneous gpu computing to the computational science community, IEEE Computing in Science and Engineering. 13 (2011) 90-5, available also at http://dx.doi.org/10.1109/MCSE.;1; 2011.83.
R. Nath, S. Tomov, J. Dongarra, An improved magma gemm for fermi graphics processing units, Int. J. High Perform. Comput. Appl. 24 (2010)511-515.
E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, A. YarKhan, PLASMA version Users’ guide,. available at http://icl. eecs. utk. edu/plasma/.
E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov, LU factorization for accelerator-based systems, in: 9th. ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), 2011.
M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Tech. Rep. UT-CS-.08-200, University of Tennessee (2008).
C. L. Lawson, R.J. Hanson, D. Kincaid, F.T. Krogh, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft. 5.1979 308-323.
10.1016/j.procs.2012.04.005_bib0015
10.1016/j.procs.2012.04.005_bib0025
10.1016/j.procs.2012.04.005_bib0005
10.1016/j.procs.2012.04.005_bib0055
10.1016/j.procs.2012.04.005_bib0010
10.1016/j.procs.2012.04.005_bib0065
10.1016/j.procs.2012.04.005_bib0035
10.1016/j.procs.2012.04.005_bib0045
10.1016/j.procs.2012.04.005_bib0040
10.1016/j.procs.2012.04.005_bib0050
10.1016/j.procs.2012.04.005_bib0020
10.1016/j.procs.2012.04.005_bib0030
10.1016/j.procs.2012.04.005_bib0060
References_xml – reference: E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK Users’ guide, 3rd Edition, Society for Industrial and Applied Mathematics, 1999.
– reference: S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, E.S. Quintana-Orti, Solving dense linear systems on graphics processors, in: Euro-Par 2008. Parallel Processing, Vol. 5168 of Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 2008, pp. 739-748.
– reference: S. Tomov, R. Nath, H. Ltaief, J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in: Proceedings of IEEE. International Parallel and Distributed Processing Symposium (IPDPS), 2010.
– reference: R. Nath, S. Tomov, J. Dongarra, An improved magma gemm for fermi graphics processing units, Int. J. High Perform. Comput. Appl. 24 (2010)511-515.
– reference: C. L. Lawson, R.J. Hanson, D. Kincaid, F.T. Krogh, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft. 5.1979 308-323.
– reference: E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov, LU factorization for accelerator-based systems, in: 9th. ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), 2011.
– reference: S. Tomov, R. Nath, P. Du, J. Dongarra, MAGMA version Users’ guide, available at http://icl. eecs.utk. edu/magma/(2009).
– reference: E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, A. YarKhan, PLASMA version Users’ guide,. available at http://icl. eecs. utk. edu/plasma/.
– reference: J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, S. Yalamanchili, Keeneland: Bringing heterogeneous gpu computing to the computational science community, IEEE Computing in Science and Engineering. 13 (2011) 90-5, available also at http://dx.doi.org/10.1109/MCSE.;1; 2011.83.
– reference: J. Dongarra, M. Faverge, H. Ltaief, P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization, Tech. rep., Innovative Computing Laboratory, University of Tennessee (2011).
– reference: M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Tech. Rep. UT-CS-.08-200, University of Tennessee (2008).
– reference: J. Dongarra, J. Bunch, C. Moler, G. Stewart, LINPACK Users’ Guide, Society for Industrial and Applied Mathematics, 1979.
– reference: M. Horton, S. Tomov, J. Dongarra, A class of hybrid lapack algorithms for multicore and gpu architectures, in: Proceedings of Symposium for Application Accelerators in High Performance Computing (SAAHPC), 2011.
– ident: 10.1016/j.procs.2012.04.005_bib0055
– ident: 10.1016/j.procs.2012.04.005_bib0010
  doi: 10.1145/355841.355847
– ident: 10.1016/j.procs.2012.04.005_bib0030
  doi: 10.1109/IPDPSW.2010.5470941
– ident: 10.1016/j.procs.2012.04.005_bib0035
  doi: 10.1109/AICCSA.2011.6126599
– ident: 10.1016/j.procs.2012.04.005_bib0040
  doi: 10.1177/1094342010385729
– ident: 10.1016/j.procs.2012.04.005_bib0045
  doi: 10.1137/1.9781611971811
– ident: 10.1016/j.procs.2012.04.005_bib0065
– ident: 10.1016/j.procs.2012.04.005_bib0015
– ident: 10.1016/j.procs.2012.04.005_bib0060
  doi: 10.1109/SAAHPC.2011.18
– ident: 10.1016/j.procs.2012.04.005_bib0020
– ident: 10.1016/j.procs.2012.04.005_bib0025
  doi: 10.1007/978-3-540-85451-7_79
– ident: 10.1016/j.procs.2012.04.005_bib0050
– ident: 10.1016/j.procs.2012.04.005_bib0005
  doi: 10.1137/1.9780898719604
SSID ssj0000388917
Score 1.9935586
Snippet One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 37
SubjectTerms Dense linear algebra
GPU accelerators
one-sided factorization
Title One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators
URI https://dx.doi.org/10.1016/j.procs.2012.04.005
Volume 9
WOSCitedRecordID wos000306288400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBdbt4e97LNl7T7Qw94ygz8ky34c27oN1qzQBPpmJFsuKY1cnLSEPvRv350kOx4pYR3sxSQiioLul7vT6e53hHzQVaJEXVeBFBwOKHkoA6WFDrAIU9fgkitpeWZ_ivE4Oz3Nj_1VzMK2ExDGZKtVfvlfRQ1jIGwsnb2HuPsvhQF4DUKHJ4gdnn8l-F9GB9iDEzOMzQJbCy3b2Wp0aBvrdFWXeEcgR7b6FnksXTj2qEsu_HY8xTYSYJHsJfxi6MHaygIAlU1Gx34QI29Ee_0h5_JGum7YP_CWQrbNOpY9b65dcpk0s_kg8_ZLY85k2_rMXc_U76MR0frUulkgY_VpJkSAFDPO3Nwx5pVwPlCijgXGm2MXoNxQ9C7mcI5mpkTWdQzpIg86X9u1PtvwBFfEBSNMIIvT5CF5FAueY6ePo9t1SA6JcXLbo7n_iR1Plc0I3Fjrbl9m4J9MnpOn_mBBPzlAvCAPtHlJnnVNO6jX4a_ISY8PavFBHT7on_igjaGS9vigiA_a4YMCPugQH7tkevh18vl74DtrBGXCsmWQZlWo8d-YJ1XEaxFVSDtXc4XXypGseCwiBZqbpbyUEYtVokLFZVqVYZ4oliV7ZMc0Rr8mFBxkXbIwLpngrFbIZxXLOlbgeOqURXKfxN0eFaWnncfuJxdFl194XtiNLXBji5AVsLH75GM_6dKxrmz_eNptfuEx7xzCAuCybeLBv058Q57gOxeLe0t2lu2Vfkcel9fL2aJ9b2H1G9VqlH4
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=One-sided+Dense+Matrix+Factorizations+on+a+Multicore+with+Multiple+GPU+Accelerators&rft.jtitle=Procedia+computer+science&rft.au=Yamazaki%2C+Ichitaro&rft.au=Tomov%2C+Stanimire&rft.au=Dongarra%2C+Jack&rft.date=2012&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=9&rft.spage=37&rft.epage=46&rft_id=info:doi/10.1016%2Fj.procs.2012.04.005&rft.externalDocID=S1877050912001263
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon