Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very of...
Saved in:
| Published in: | IEEE transactions on parallel and distributed systems Vol. 21; no. 4; pp. 417 - 423 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.04.2010
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1045-9219, 1558-2183 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 × 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form. |
|---|---|
| AbstractList | The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000\times 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form. The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 × 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form. |
| Author | Ltaief, H. Dongarra, J. Kurzak, J. |
| Author_xml | – sequence: 1 givenname: H. surname: Ltaief fullname: Ltaief, H. organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA – sequence: 2 givenname: J. surname: Kurzak fullname: Kurzak, J. organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA – sequence: 3 givenname: J. surname: Dongarra fullname: Dongarra, J. organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA |
| BookMark | eNp9kb1PHDEQxS0EEh-hS5fGSpMU7GF77fW6BAJJJBAILrSWzx4HI9-a2F4l-e-zq0MpkGCaGWl-70kzbx9tD2kAhN5TsqCUqOPlzZe7BSNELaTaQntUiL5htG-3p5lw0ShG1S7aL-WREMoF4Xvo_sZkEyNEvPydmrvgwOErU3P4g2_BjbaGNOCa8KkZHD4NLpifaTARX6S8xtPqaow12JQBn2T7ECrYOmYo79CON7HA4XM_QD8uzpdn35rL66_fz04uG9tyWRtHTWelcN5S57hYWeEd84q5VjJYGQ7S972H3rHOglSGCsat73zLGXd8JdsD9Gnj-5TTrxFK1etQLMRoBkhj0ZJPTkLxdiI_v0nSTlLWtYp3E_rxBfqYxjxdXbSiVM01-7ENZHMqJYPXNlQzv6tmE6KmRM-Z6DkTPWeipZpERy9ETzmsTf77Gv5hgwcA-I9y1UkhRfsPUZyYKQ |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1109_TCSVT_2022_3145341 crossref_primary_10_1177_1094342013502097 crossref_primary_10_1145_2887740 crossref_primary_10_1137_17M1117732 crossref_primary_10_1145_3764932 crossref_primary_10_1007_s11075_013_9744_5 crossref_primary_10_1016_j_parco_2017_10_004 crossref_primary_10_1109_TPDS_2012_161 crossref_primary_10_1145_2894747 crossref_primary_10_1002_cpe_3306 crossref_primary_10_1145_2450153_2450154 |
| Cites_doi | 10.1137/0702016 10.1137/1.9780898719574 10.1145/355984.355990 10.1145/1055531.1055534 10.1016/0167-8191(95)00064-X 10.1109/PDP.2008.37 10.1016/j.laa.2004.09.019 10.1016/S0024-3795(01)00569-9 10.1137/1.9780898719604 10.1016/0167-8191(95)00015-g 10.1016/0010-4655(96)00017-3 10.1007/3-540-70734-4_9 10.1137/1.9781611971408 10.1109/TPDS.2007.70813 10.1145/1248377.1248394 10.3233/SPR-2008–0268 10.1007/BFb0095328 10.1145/1377612.1377615 10.1002/cpe.1301 10.1137/050636723 10.1147/rd.444.0605 10.1016/S0167-8191(99)00041-1 10.1175/mwr3289.1 10.1137/0910005 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Apr 2010 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Apr 2010 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1109/TPDS.2009.79 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Technology Research Database Technology Research Database Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science Architecture |
| EISSN | 1558-2183 |
| EndPage | 423 |
| ExternalDocumentID | 2543367921 10_1109_TPDS_2009_79 4967575 |
| Genre | orig-research |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| ID | FETCH-LOGICAL-c347t-d1a6c75dfc1dd45bc5fd2f92d372eba4e7f88fe8d26ce79a1524cf6f3424d4b73 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 11 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000274794200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Thu Oct 02 07:02:33 EDT 2025 Tue Sep 30 23:39:27 EDT 2025 Sun Nov 30 05:05:27 EST 2025 Tue Nov 18 21:21:28 EST 2025 Sat Nov 29 08:12:43 EST 2025 Wed Aug 27 02:52:19 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c347t-d1a6c75dfc1dd45bc5fd2f92d372eba4e7f88fe8d26ce79a1524cf6f3424d4b73 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| PQID | 911999993 |
| PQPubID | 23500 |
| PageCount | 7 |
| ParticipantIDs | proquest_journals_911999993 proquest_miscellaneous_743725943 crossref_citationtrail_10_1109_TPDS_2009_79 ieee_primary_4967575 proquest_miscellaneous_1671263946 crossref_primary_10_1109_TPDS_2009_79 |
| PublicationCentury | 2000 |
| PublicationDate | 2010-April 2010-4-00 20100401 |
| PublicationDateYYYYMMDD | 2010-04-01 |
| PublicationDate_xml | – month: 04 year: 2010 text: 2010-April |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2010 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 Golub (ref14) 1996 ref15 ref30 ref11 ref10 ref17 ref16 ref19 ref18 Ltaief (ref22) 2008 ref24 ref23 ref26 ref25 ref20 ref21 Trefethen (ref28) 1997 ref8 Yip (ref29) 1979 ref7 ref9 ref4 ref3 ref6 ref5 Stewart (ref27) 1998 |
| References_xml | – ident: ref15 doi: 10.1137/0702016 – volume-title: Numerical Linear Algebra year: 1997 ident: ref28 doi: 10.1137/1.9780898719574 – ident: ref8 doi: 10.1145/355984.355990 – volume-title: Matrix Computation year: 1996 ident: ref14 – ident: ref17 doi: 10.1145/1055531.1055534 – ident: ref21 doi: 10.1016/0167-8191(95)00064-X – ident: ref24 doi: 10.1109/PDP.2008.37 – ident: ref4 doi: 10.1016/j.laa.2004.09.019 – ident: ref25 doi: 10.1016/S0024-3795(01)00569-9 – ident: ref3 doi: 10.1137/1.9780898719604 – ident: ref5 doi: 10.1016/0167-8191(95)00015-g – ident: ref9 doi: 10.1016/0010-4655(96)00017-3 – ident: ref13 doi: 10.1007/3-540-70734-4_9 – year: 2008 ident: ref22 article-title: LAPACK Working Note 208: Parallel Block Hessenberg Reduction Using Algorithms-by-Tiles for Multicore Architectures Revisited – volume-title: Matrix Algorithms Volume I: Matrix Decompositions year: 1998 ident: ref27 doi: 10.1137/1.9781611971408 – year: 1979 ident: ref29 article-title: Fortran Subroutines for Out-of-Core Solutions of Large Complex Linear Systems – ident: ref19 doi: 10.1109/TPDS.2007.70813 – ident: ref30 doi: 10.1145/1248377.1248394 – ident: ref20 doi: 10.3233/SPR-2008–0268 – ident: ref18 doi: 10.1109/TPDS.2007.70813 – ident: ref11 doi: 10.1007/BFb0095328 – ident: ref23 doi: 10.1145/1377612.1377615 – ident: ref7 doi: 10.1002/cpe.1301 – ident: ref6 doi: 10.1137/050636723 – ident: ref12 doi: 10.1147/rd.444.0605 – ident: ref16 doi: 10.1016/S0167-8191(99)00041-1 – ident: ref10 doi: 10.1175/mwr3289.1 – ident: ref26 doi: 10.1137/0910005 |
| SSID | ssj0014504 |
| Score | 2.006519 |
| Snippet | The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU,... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 417 |
| SubjectTerms | Algorithms Application software Architecture Bandwidth Bidiagonal reduction Computer programs Factorization Hardware Linear algebra Lists Matrix decomposition Multicore processing multicores Preprocessing Reduction Singular value decomposition Software Software libraries Software performance tile algorithms |
| Title | Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures |
| URI | https://ieeexplore.ieee.org/document/4967575 https://www.proquest.com/docview/911999993 https://www.proquest.com/docview/1671263946 https://www.proquest.com/docview/743725943 |
| Volume | 21 |
| WOSCitedRecordID | wos000274794200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4B4tAeoECrhkflSu2pTUn8iOMjtKy4FK3KtuIWOX5UK6EE7QP4-XicbLQS5dCcInkkRx6PZ5yZ-T6AT5lmJsusSktNRcqDC0uV8EUI5BijPqb-dCSbkFdX5c2NGm_A16EXxjkXi8_cN3yNuXzbmiX-KjvlKoS3UmzCppRF16s1ZAy4iFSB4XYhUhXMcChyV6eT8Y_rDpkSC7bW3E_kU3l2CEfPMtr9v296Azt9BEnOOpXvwYZr9mF3xc5AemPdh9drUIMH8GesZ0ibcksmD216PbXOkp8Iz_9IfiF6K-qHLFpyrhtLzqdh1_zFGJ2MQkxLwlBs1EXIS3K2lnqYv4Xfo4vJ98u051RIDeNykdpcF0YK601uLRe1Ed5Sr6hlkrpacyd9WXpXWloYJ5UO7p0bX3jGKbe8luwdbDVt494DYVLnpUek81xz6VytqAiaZ1LVWZC3CXxZLXVlesBx5L24reLFI1MVKgZ5MFUlVQKfB-m7DmjjBbkDVMIg069_AkcrLVa9Fc6rcJArfFgCH4fRYD6YE9GNa5fzKi9kTkOUxosEyAsyEnObQnF2-O-5j-BVV1OA9TzHsLWYLd0JbJv7xXQ--xC36RP-v-av |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VggQcWmhBhBYwEpwgNPEjjo8tsCqiXa3ognqLHD_QSlWC9gH8_HqcbLRS6YGcInkkRx6PZ5yZ-T6AN5lmJsusSktNRcqDC0uV8EUI5BijPqb-dCSbkONxeXmpJlvwfuiFcc7F4jP3AV9jLt-2ZoW_yo64CuGtFHfgruCcZl231pAz4CKSBYb7hUhVMMShzF0dTSefLjpsSizZ2nBAkVHlxjEcfcto9_--6hHs9DEkOe6U_hi2XLMHu2t-BtKb6x483AAb3IcfEz1H4pQrMv3Tphcz6yw5R4D-v-Qb4reihsiyJSe6seRkFvbNT4zSyShEtSQMxVZdBL0kxxvJh8UT-D76PP14mvasCqlhXC5Tm-vCSGG9ya3lojbCW-oVtUxSV2vupC9L70pLC-Ok0sHBc-MLzzjllteSPYXtpm3cMyBM6rz0iHWeay6dqxUVQfdMqjoL8jaBd-ulrkwPOY7MF1dVvHpkqkLFIBOmqqRK4O0g_auD2rhFbh-VMMj065_AwVqLVW-Hiyoc5QoflsDrYTQYEGZFdOPa1aLKC5nTEKfxIgFyi4zE7KZQnD3_99yv4P7p9PysOvsy_noAD7oKA6zuOYTt5XzlXsA983s5W8xfxi17DR8F6fY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Two-Sided+Matrix+Reduction+to+Band+Bidiagonal+Form+on+Multicore+Architectures&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Ltaief%2C+H.&rft.au=Kurzak%2C+J.&rft.au=Dongarra%2C+J.&rft.date=2010-04-01&rft.issn=1045-9219&rft.volume=21&rft.issue=4&rft.spage=417&rft.epage=423&rft_id=info:doi/10.1109%2FTPDS.2009.79&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPDS_2009_79 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |