One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators
One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of curren...
Uložené v:
| Vydané v: | Procedia computer science Ročník 9; s. 37 - 46 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
2012
|
| Predmet: | |
| ISSN: | 1877-0509, 1877-0509 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs. |
|---|---|
| AbstractList | One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs. |
| Author | Yamazaki, Ichitaro Dongarra, Jack Tomov, Stanimire |
| Author_xml | – sequence: 1 givenname: Ichitaro surname: Yamazaki fullname: Yamazaki, Ichitaro email: iyamazak@cs.utk.edu – sequence: 2 givenname: Stanimire surname: Tomov fullname: Tomov, Stanimire email: tomov@cs.utk.edu – sequence: 3 givenname: Jack surname: Dongarra fullname: Dongarra, Jack email: dongarra@cs.utk.edu |
| BookMark | eNqFkMtOwzAQRS1UJErpF7DxDyTYcZzECxZVoQWpVZGga8uxJ8JViCvbvPr1pC0LxAJmM3MXZ6R7ztGgcx0gdElJSgktrjbp1jsd0ozQLCV5Sgg_QUNalWVCOBGDH_cZGoewIf2wqhK0HKLHVQdJsAYMvoEuAF6q6O0Hnikdnbc7Fa3rAnYdVnj52karnQf8buPzMW5bwPOHNZ5oDS141UPhAp02qg0w_t4jtJ7dPk3vksVqfj-dLBLN8iomRWUIMJGDYIbypqRGCC4aXjNOKVWGZyWtS8HygmtF86xmNam5KowmgtV5xUaIHf9q70Lw0Mitty_Kf0pK5F6N3MiDGrlXI0kuezU9JX5R2sZDzeiVbf9hr48s9LXeLHgZtIVOg7EedJTG2T_5LxC8gnU |
| CitedBy_id | crossref_primary_10_1002_cpe_3152 crossref_primary_10_1002_cpe_4012 crossref_primary_10_1002_cpe_5754 crossref_primary_10_1002_cpe_4504 crossref_primary_10_1016_j_cam_2014_02_011 crossref_primary_10_1109_TPDS_2018_2842785 |
| Cites_doi | 10.1145/355841.355847 10.1109/IPDPSW.2010.5470941 10.1109/AICCSA.2011.6126599 10.1177/1094342010385729 10.1137/1.9781611971811 10.1109/SAAHPC.2011.18 10.1007/978-3-540-85451-7_79 10.1137/1.9780898719604 |
| ContentType | Journal Article |
| Copyright | 2012 |
| Copyright_xml | – notice: 2012 |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.procs.2012.04.005 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 46 |
| ExternalDocumentID | 10_1016_j_procs_2012_04_005 S1877050912001263 |
| GroupedDBID | --K 0R~ 0SF 1B1 457 5VS 6I. 71M AACTN AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO ABMAC ACGFS ADBBV ADEZE AEXQZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E NCXOZ O-L O9- OK1 P2P RIG ROL SES SSZ 9DU AAYWO AAYXX ABWVN ACRPL ACVFH ADCNI ADNMO ADVLN AEUPX AFPUW AIGII AKBMS AKRWK AKYEP CITATION ~HD |
| ID | FETCH-LOGICAL-c348t-68d0e394e93d15f71d9959f5b35111ad5271b793465ca142b3b0b5a6dc093b483 |
| ISICitedReferencesCount | 15 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000306288400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1877-0509 |
| IngestDate | Sat Nov 29 02:44:17 EST 2025 Tue Nov 18 21:47:21 EST 2025 Wed May 17 00:09:02 EDT 2023 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | one-sided factorization GPU accelerators Dense linear algebra |
| Language | English |
| License | http://creativecommons.org/licenses/by-nc-nd/3.0 https://www.elsevier.com/tdm/userlicense/1.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c348t-68d0e394e93d15f71d9959f5b35111ad5271b793465ca142b3b0b5a6dc093b483 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.procs.2012.04.005 |
| PageCount | 10 |
| ParticipantIDs | crossref_primary_10_1016_j_procs_2012_04_005 crossref_citationtrail_10_1016_j_procs_2012_04_005 elsevier_sciencedirect_doi_10_1016_j_procs_2012_04_005 |
| PublicationCentury | 2000 |
| PublicationDate | 2012 2012-00-00 |
| PublicationDateYYYYMMDD | 2012-01-01 |
| PublicationDate_xml | – year: 2012 text: 2012 |
| PublicationDecade | 2010 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2012 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | M. Horton, S. Tomov, J. Dongarra, A class of hybrid lapack algorithms for multicore and gpu architectures, in: Proceedings of Symposium for Application Accelerators in High Performance Computing (SAAHPC), 2011. J. Dongarra, M. Faverge, H. Ltaief, P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization, Tech. rep., Innovative Computing Laboratory, University of Tennessee (2011). E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK Users’ guide, 3rd Edition, Society for Industrial and Applied Mathematics, 1999. S. Tomov, R. Nath, P. Du, J. Dongarra, MAGMA version Users’ guide, available at http://icl. eecs.utk. edu/magma/(2009). S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, E.S. Quintana-Orti, Solving dense linear systems on graphics processors, in: Euro-Par 2008. Parallel Processing, Vol. 5168 of Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 2008, pp. 739-748. J. Dongarra, J. Bunch, C. Moler, G. Stewart, LINPACK Users’ Guide, Society for Industrial and Applied Mathematics, 1979. S. Tomov, R. Nath, H. Ltaief, J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in: Proceedings of IEEE. International Parallel and Distributed Processing Symposium (IPDPS), 2010. J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, S. Yalamanchili, Keeneland: Bringing heterogeneous gpu computing to the computational science community, IEEE Computing in Science and Engineering. 13 (2011) 90-5, available also at http://dx.doi.org/10.1109/MCSE.;1; 2011.83. R. Nath, S. Tomov, J. Dongarra, An improved magma gemm for fermi graphics processing units, Int. J. High Perform. Comput. Appl. 24 (2010)511-515. E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, A. YarKhan, PLASMA version Users’ guide,. available at http://icl. eecs. utk. edu/plasma/. E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov, LU factorization for accelerator-based systems, in: 9th. ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), 2011. M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Tech. Rep. UT-CS-.08-200, University of Tennessee (2008). C. L. Lawson, R.J. Hanson, D. Kincaid, F.T. Krogh, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft. 5.1979 308-323. 10.1016/j.procs.2012.04.005_bib0015 10.1016/j.procs.2012.04.005_bib0025 10.1016/j.procs.2012.04.005_bib0005 10.1016/j.procs.2012.04.005_bib0055 10.1016/j.procs.2012.04.005_bib0010 10.1016/j.procs.2012.04.005_bib0065 10.1016/j.procs.2012.04.005_bib0035 10.1016/j.procs.2012.04.005_bib0045 10.1016/j.procs.2012.04.005_bib0040 10.1016/j.procs.2012.04.005_bib0050 10.1016/j.procs.2012.04.005_bib0020 10.1016/j.procs.2012.04.005_bib0030 10.1016/j.procs.2012.04.005_bib0060 |
| References_xml | – reference: E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, D. Sorensen, LAPACK Users’ guide, 3rd Edition, Society for Industrial and Applied Mathematics, 1999. – reference: S. Barrachina, M. Castillo, F.D. Igual, R. Mayo, E.S. Quintana-Orti, Solving dense linear systems on graphics processors, in: Euro-Par 2008. Parallel Processing, Vol. 5168 of Lecture Notes in Computer Science, Springer Berlin/Heidelberg, 2008, pp. 739-748. – reference: S. Tomov, R. Nath, H. Ltaief, J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, in: Proceedings of IEEE. International Parallel and Distributed Processing Symposium (IPDPS), 2010. – reference: R. Nath, S. Tomov, J. Dongarra, An improved magma gemm for fermi graphics processing units, Int. J. High Perform. Comput. Appl. 24 (2010)511-515. – reference: C. L. Lawson, R.J. Hanson, D. Kincaid, F.T. Krogh, Basic Linear Algebra Subprograms for FORTRAN usage, ACM Trans. Math. Soft. 5.1979 308-323. – reference: E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaief, S. Tomov, LU factorization for accelerator-based systems, in: 9th. ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 11), 2011. – reference: S. Tomov, R. Nath, P. Du, J. Dongarra, MAGMA version Users’ guide, available at http://icl. eecs.utk. edu/magma/(2009). – reference: E. Agullo, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, J. Langou, H. Ltaief, P. Luszczek, A. YarKhan, PLASMA version Users’ guide,. available at http://icl. eecs. utk. edu/plasma/. – reference: J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith, J. Rogers, P. Roth, K. Spafford, S. Yalamanchili, Keeneland: Bringing heterogeneous gpu computing to the computational science community, IEEE Computing in Science and Engineering. 13 (2011) 90-5, available also at http://dx.doi.org/10.1109/MCSE.;1; 2011.83. – reference: J. Dongarra, M. Faverge, H. Ltaief, P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization, Tech. rep., Innovative Computing Laboratory, University of Tennessee (2011). – reference: M. Baboulin, J. Dongarra, S. Tomov, Some issues in dense linear algebra for multicore and special purpose architectures, Tech. Rep. UT-CS-.08-200, University of Tennessee (2008). – reference: J. Dongarra, J. Bunch, C. Moler, G. Stewart, LINPACK Users’ Guide, Society for Industrial and Applied Mathematics, 1979. – reference: M. Horton, S. Tomov, J. Dongarra, A class of hybrid lapack algorithms for multicore and gpu architectures, in: Proceedings of Symposium for Application Accelerators in High Performance Computing (SAAHPC), 2011. – ident: 10.1016/j.procs.2012.04.005_bib0055 – ident: 10.1016/j.procs.2012.04.005_bib0010 doi: 10.1145/355841.355847 – ident: 10.1016/j.procs.2012.04.005_bib0030 doi: 10.1109/IPDPSW.2010.5470941 – ident: 10.1016/j.procs.2012.04.005_bib0035 doi: 10.1109/AICCSA.2011.6126599 – ident: 10.1016/j.procs.2012.04.005_bib0040 doi: 10.1177/1094342010385729 – ident: 10.1016/j.procs.2012.04.005_bib0045 doi: 10.1137/1.9781611971811 – ident: 10.1016/j.procs.2012.04.005_bib0065 – ident: 10.1016/j.procs.2012.04.005_bib0015 – ident: 10.1016/j.procs.2012.04.005_bib0060 doi: 10.1109/SAAHPC.2011.18 – ident: 10.1016/j.procs.2012.04.005_bib0020 – ident: 10.1016/j.procs.2012.04.005_bib0025 doi: 10.1007/978-3-540-85451-7_79 – ident: 10.1016/j.procs.2012.04.005_bib0050 – ident: 10.1016/j.procs.2012.04.005_bib0005 doi: 10.1137/1.9780898719604 |
| SSID | ssj0000388917 |
| Score | 1.9935586 |
| Snippet | One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 37 |
| SubjectTerms | Dense linear algebra GPU accelerators one-sided factorization |
| Title | One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators |
| URI | https://dx.doi.org/10.1016/j.procs.2012.04.005 |
| Volume | 9 |
| WOSCitedRecordID | wos000306288400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBdbt4e97LNl7T7Qw94ygz8ky34c27oN1qzQBPpmJFsuKY1cnLSEPvRv350kOx4pYR3sxSQiioLul7vT6e53hHzQVaJEXVeBFBwOKHkoA6WFDrAIU9fgkitpeWZ_ivE4Oz3Nj_1VzMK2ExDGZKtVfvlfRQ1jIGwsnb2HuPsvhQF4DUKHJ4gdnn8l-F9GB9iDEzOMzQJbCy3b2Wp0aBvrdFWXeEcgR7b6FnksXTj2qEsu_HY8xTYSYJHsJfxi6MHaygIAlU1Gx34QI29Ee_0h5_JGum7YP_CWQrbNOpY9b65dcpk0s_kg8_ZLY85k2_rMXc_U76MR0frUulkgY_VpJkSAFDPO3Nwx5pVwPlCijgXGm2MXoNxQ9C7mcI5mpkTWdQzpIg86X9u1PtvwBFfEBSNMIIvT5CF5FAueY6ePo9t1SA6JcXLbo7n_iR1Plc0I3Fjrbl9m4J9MnpOn_mBBPzlAvCAPtHlJnnVNO6jX4a_ISY8PavFBHT7on_igjaGS9vigiA_a4YMCPugQH7tkevh18vl74DtrBGXCsmWQZlWo8d-YJ1XEaxFVSDtXc4XXypGseCwiBZqbpbyUEYtVokLFZVqVYZ4oliV7ZMc0Rr8mFBxkXbIwLpngrFbIZxXLOlbgeOqURXKfxN0eFaWnncfuJxdFl194XtiNLXBji5AVsLH75GM_6dKxrmz_eNptfuEx7xzCAuCybeLBv058Q57gOxeLe0t2lu2Vfkcel9fL2aJ9b2H1G9VqlH4 |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=One-sided+Dense+Matrix+Factorizations+on+a+Multicore+with+Multiple+GPU+Accelerators&rft.jtitle=Procedia+computer+science&rft.au=Yamazaki%2C+Ichitaro&rft.au=Tomov%2C+Stanimire&rft.au=Dongarra%2C+Jack&rft.date=2012&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=9&rft.spage=37&rft.epage=46&rft_id=info:doi/10.1016%2Fj.procs.2012.04.005&rft.externalDocID=S1877050912001263 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |