Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very of...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 21; číslo 4; s. 417 - 423
Hlavní autoři: Ltaief, H., Kurzak, J., Dongarra, J.
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.04.2010
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 × 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form.
AbstractList The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000\times 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form.
The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 × 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form.
Author Ltaief, H.
Dongarra, J.
Kurzak, J.
Author_xml – sequence: 1
  givenname: H.
  surname: Ltaief
  fullname: Ltaief, H.
  organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
– sequence: 2
  givenname: J.
  surname: Kurzak
  fullname: Kurzak, J.
  organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
– sequence: 3
  givenname: J.
  surname: Dongarra
  fullname: Dongarra, J.
  organization: Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
BookMark eNp9kb1PHDEQxS0EEh-hS5fGSpMU7GF77fW6BAJJJBAILrSWzx4HI9-a2F4l-e-zq0MpkGCaGWl-70kzbx9tD2kAhN5TsqCUqOPlzZe7BSNELaTaQntUiL5htG-3p5lw0ShG1S7aL-WREMoF4Xvo_sZkEyNEvPydmrvgwOErU3P4g2_BjbaGNOCa8KkZHD4NLpifaTARX6S8xtPqaow12JQBn2T7ECrYOmYo79CON7HA4XM_QD8uzpdn35rL66_fz04uG9tyWRtHTWelcN5S57hYWeEd84q5VjJYGQ7S972H3rHOglSGCsat73zLGXd8JdsD9Gnj-5TTrxFK1etQLMRoBkhj0ZJPTkLxdiI_v0nSTlLWtYp3E_rxBfqYxjxdXbSiVM01-7ENZHMqJYPXNlQzv6tmE6KmRM-Z6DkTPWeipZpERy9ETzmsTf77Gv5hgwcA-I9y1UkhRfsPUZyYKQ
CODEN ITDSEO
CitedBy_id crossref_primary_10_1109_TCSVT_2022_3145341
crossref_primary_10_1177_1094342013502097
crossref_primary_10_1145_2887740
crossref_primary_10_1137_17M1117732
crossref_primary_10_1145_3764932
crossref_primary_10_1007_s11075_013_9744_5
crossref_primary_10_1016_j_parco_2017_10_004
crossref_primary_10_1109_TPDS_2012_161
crossref_primary_10_1145_2894747
crossref_primary_10_1002_cpe_3306
crossref_primary_10_1145_2450153_2450154
Cites_doi 10.1137/0702016
10.1137/1.9780898719574
10.1145/355984.355990
10.1145/1055531.1055534
10.1016/0167-8191(95)00064-X
10.1109/PDP.2008.37
10.1016/j.laa.2004.09.019
10.1016/S0024-3795(01)00569-9
10.1137/1.9780898719604
10.1016/0167-8191(95)00015-g
10.1016/0010-4655(96)00017-3
10.1007/3-540-70734-4_9
10.1137/1.9781611971408
10.1109/TPDS.2007.70813
10.1145/1248377.1248394
10.3233/SPR-2008–0268
10.1007/BFb0095328
10.1145/1377612.1377615
10.1002/cpe.1301
10.1137/050636723
10.1147/rd.444.0605
10.1016/S0167-8191(99)00041-1
10.1175/mwr3289.1
10.1137/0910005
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Apr 2010
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Apr 2010
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/TPDS.2009.79
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Technology Research Database
Technology Research Database
Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
Architecture
EISSN 1558-2183
EndPage 423
ExternalDocumentID 2543367921
10_1109_TPDS_2009_79
4967575
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c347t-d1a6c75dfc1dd45bc5fd2f92d372eba4e7f88fe8d26ce79a1524cf6f3424d4b73
IEDL.DBID RIE
ISICitedReferencesCount 11
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000274794200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Thu Oct 02 07:02:33 EDT 2025
Tue Sep 30 23:39:27 EDT 2025
Sun Nov 30 05:05:27 EST 2025
Tue Nov 18 21:21:28 EST 2025
Sat Nov 29 08:12:43 EST 2025
Wed Aug 27 02:52:19 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c347t-d1a6c75dfc1dd45bc5fd2f92d372eba4e7f88fe8d26ce79a1524cf6f3424d4b73
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
PQID 911999993
PQPubID 23500
PageCount 7
ParticipantIDs proquest_journals_911999993
proquest_miscellaneous_743725943
crossref_citationtrail_10_1109_TPDS_2009_79
ieee_primary_4967575
proquest_miscellaneous_1671263946
crossref_primary_10_1109_TPDS_2009_79
PublicationCentury 2000
PublicationDate 2010-April
2010-4-00
20100401
PublicationDateYYYYMMDD 2010-04-01
PublicationDate_xml – month: 04
  year: 2010
  text: 2010-April
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2010
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
Golub (ref14) 1996
ref15
ref30
ref11
ref10
ref17
ref16
ref19
ref18
Ltaief (ref22) 2008
ref24
ref23
ref26
ref25
ref20
ref21
Trefethen (ref28) 1997
ref8
Yip (ref29) 1979
ref7
ref9
ref4
ref3
ref6
ref5
Stewart (ref27) 1998
References_xml – ident: ref15
  doi: 10.1137/0702016
– volume-title: Numerical Linear Algebra
  year: 1997
  ident: ref28
  doi: 10.1137/1.9780898719574
– ident: ref8
  doi: 10.1145/355984.355990
– volume-title: Matrix Computation
  year: 1996
  ident: ref14
– ident: ref17
  doi: 10.1145/1055531.1055534
– ident: ref21
  doi: 10.1016/0167-8191(95)00064-X
– ident: ref24
  doi: 10.1109/PDP.2008.37
– ident: ref4
  doi: 10.1016/j.laa.2004.09.019
– ident: ref25
  doi: 10.1016/S0024-3795(01)00569-9
– ident: ref3
  doi: 10.1137/1.9780898719604
– ident: ref5
  doi: 10.1016/0167-8191(95)00015-g
– ident: ref9
  doi: 10.1016/0010-4655(96)00017-3
– ident: ref13
  doi: 10.1007/3-540-70734-4_9
– year: 2008
  ident: ref22
  article-title: LAPACK Working Note 208: Parallel Block Hessenberg Reduction Using Algorithms-by-Tiles for Multicore Architectures Revisited
– volume-title: Matrix Algorithms Volume I: Matrix Decompositions
  year: 1998
  ident: ref27
  doi: 10.1137/1.9781611971408
– year: 1979
  ident: ref29
  article-title: Fortran Subroutines for Out-of-Core Solutions of Large Complex Linear Systems
– ident: ref19
  doi: 10.1109/TPDS.2007.70813
– ident: ref30
  doi: 10.1145/1248377.1248394
– ident: ref20
  doi: 10.3233/SPR-2008–0268
– ident: ref18
  doi: 10.1109/TPDS.2007.70813
– ident: ref11
  doi: 10.1007/BFb0095328
– ident: ref23
  doi: 10.1145/1377612.1377615
– ident: ref7
  doi: 10.1002/cpe.1301
– ident: ref6
  doi: 10.1137/050636723
– ident: ref12
  doi: 10.1147/rd.444.0605
– ident: ref16
  doi: 10.1016/S0167-8191(99)00041-1
– ident: ref10
  doi: 10.1175/mwr3289.1
– ident: ref26
  doi: 10.1137/0910005
SSID ssj0014504
Score 2.006414
Snippet The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU,...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 417
SubjectTerms Algorithms
Application software
Architecture
Bandwidth
Bidiagonal reduction
Computer programs
Factorization
Hardware
Linear algebra
Lists
Matrix decomposition
Multicore processing
multicores
Preprocessing
Reduction
Singular value decomposition
Software
Software libraries
Software performance
tile algorithms
Title Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
URI https://ieeexplore.ieee.org/document/4967575
https://www.proquest.com/docview/911999993
https://www.proquest.com/docview/1671263946
https://www.proquest.com/docview/743725943
Volume 21
WOSCitedRecordID wos000274794200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4B6qE9FAqtCBTkSu2pTWFtx46PvFYcKFqVLeIW-YlWQkm1D-Dnd-xko5VaDs0pkkdy5PHY32RmvgH47MpCSxdYHjRFB8UYmxtlRG49wnPDnQipbu32Sl5fl3d3arQG3_paGO99Sj7z3-NriuW7xi7ir7IjrhDeymId1qUUba1WHzHgRWoViN5FkSs0wz7JXR2NR-c3LTNlTNhauX5SP5W_DuF0sww3_--btuBthyDJSavyd7Dm623YXHZnIJ2xbsObFarBHbgd6Wlsm_JAxk9NfjNx3pEfkZ7_mfyM7K1RP2TekFNdO3I6wV1zHzE6GSKmJTiUCnUj5SU5WQk9zN7Dr-HF-Owy73oq5JZxOc_dQAsrCxfswDleGFsER4OijknqjeZehrIMvnRUWC-Vxuud2yAC45Q7biT7ABt1U_tdIAGxGPqgRqPV85Jy44UM6OEYxxg6RTSDr8ulrmxHOB77XjxUyfE4VlVUTOyDqSqpMvjSS_9uiTZekNuJSuhluvXPYH-pxaqzwlmFB7mKD8vgUz-K5hNjIrr2zWJWDYQcUERpXGRAXpCRMbZZKM72_j33PrxucwpiPs9H2JhPF_4AXtnH-WQ2PUzb9A-YFOfl
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VggQcKLSghvIwEpwgbdePOD62wKqI7WpFl6q3yE-0UpVU-wB-Ph4nG60EPZBTJI_kyOOxv8nMfAPw1pVCSxdYHjSNDooxNjfKFLn1EZ4b7oqQ6tYuR3I8Lq-u1GQLPvS1MN77lHzmD_E1xfJdY1f4q-yIqwhvpbgDdwXn9Lit1upjBlykZoHRvxC5iobYp7mro-nk00XLTYkpWxsXUOqo8tcxnO6W4c7_fdVjeNRhSHLSKv0JbPl6F3bW_RlIZ6678HCDbHAPLid6jo1Trsn0V5NfzJx35BwJ-n-Tb8jfihoiy4ac6tqR01ncNz8QpZNhRLUkDqVSXSS9JCcbwYfFU_g-_Dz9eJZ3XRVyy7hc5m6gCyuFC3bgHBfGiuBoUNQxSb3R3MtQlsGXjhbWS6XjBc9tKALjlDtuJHsG23VT-30gIaKx6IUaHe2el5QbX8gQfRzjGItuEc3g_XqpK9tRjmPni-squR7HqkLFYCdMVUmVwbte-qal2rhFbg-V0Mt065_BwVqLVWeHiyoe5QoflsGbfjQaEEZFdO2b1aIaFHJAI07jRQbkFhmJ0U2hOHv-77lfw_2z6fmoGn0Zfz2AB22GAWb3vIDt5XzlX8I9-3M5W8xfpS37B8o76yw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Two-Sided+Matrix+Reduction+to+Band+Bidiagonal+Form+on+Multicore+Architectures&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Ltaief%2C+H.&rft.au=Kurzak%2C+J.&rft.au=Dongarra%2C+J.&rft.date=2010-04-01&rft.issn=1045-9219&rft.volume=21&rft.issue=4&rft.spage=417&rft.epage=423&rft_id=info:doi/10.1109%2FTPDS.2009.79&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPDS_2009_79
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon