Algorithmic optimizations of a conjugate gradient solver on shared memory architectures

OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is es...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of parallel, emergent and distributed systems Jg. 21; H. 5; S. 345 - 363
Hauptverfasser: Löf, Henrik, Rantakokko, Jarmo
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Taylor & Francis Group 01.10.2006
Schlagworte:
ISSN:1744-5760, 1744-5779, 1744-5779
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems. We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.
AbstractList OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems. We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck.
Author Löf, Henrik
Rantakokko, Jarmo
Author_xml – sequence: 1
  givenname: Henrik
  surname: Löf
  fullname: Löf, Henrik
  email: henrik.lof@it.uu.se
  organization: Uppsala University, Department of Information Technology
– sequence: 2
  givenname: Jarmo
  surname: Rantakokko
  fullname: Rantakokko, Jarmo
  organization: Uppsala University, Department of Information Technology
BackLink https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-80937$$DView record from Swedish Publication Index (Uppsala universitet)
BookMark eNqFkMtKAzEUhoNUsK0-gLvspZrbTCbgptQrFNx4WYZMJpmmzExKkrHWp7el0k1BV-fn8H0Hzj8Cg853BoBLjK4xKtAN5oxlPEc5QlleYCpOwHC3m2Sci8Eh5-gMjGJcIsQIy_kQfEyb2geXFq3T0K-Sa923Ss53EXoLFdS-W_a1SgbWQVXOdAlG33yaAH0H40IFU8HWtD5soAp64ZLRqQ8mnoNTq5poLn7nGLw93L_Onibzl8fn2XQ-0ZSyNNGi5LYitmDGEq4sETgzSNhclQRVRWk4w1gbQTNhSMUKhuj2TV5iQqggXNMxuNrfjWuz6ku5Cq5VYSO9cvLOvU-lD7Xse1kgQfmWxntaBx9jMPbAYyR3PcqjHrfO7d5xnfWhVWsfmkomtWl8sEF12kVJ_9L5v_qRJdNXoj-zx45W
Cites_doi 10.1145/1088149.1088201
10.1155/2000/464182
10.1017/CBO9780511615115
10.1147/rd.416.0711
10.1109/40.653032
10.1016/0377-0427(89)90045-9
10.1137/1031003
10.1145/331532.331562
10.1145/582034.582041
10.1145/264107.264206
10.1137/0713023
10.1137/S1064827595287997
10.1137/S00361445003820
10.1145/800195.805928
10.1145/318789.318816
10.1155/2000/417570
ContentType Journal Article
Copyright Copyright Taylor & Francis Group, LLC 2006
Copyright_xml – notice: Copyright Taylor & Francis Group, LLC 2006
DBID AAYXX
CITATION
ADTPV
AOWAS
DF2
DOI 10.1080/17445760600568139
DatabaseName CrossRef
SwePub
SwePub Articles
SWEPUB Uppsala universitet
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1744-5779
EndPage 363
ExternalDocumentID oai_DiVA_org_uu_80937
10_1080_17445760600568139
156796
GroupedDBID .7F
.QJ
0BK
0R~
29J
30N
4.4
5GY
5VS
AAENE
AAGDL
AAHIA
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
ABCCY
ABFIM
ABHAV
ABLIJ
ABPAQ
ABPEM
ABTAI
ABXUL
ABXYU
ACGEJ
ACGFS
ACTIO
ACTTO
ADCVX
ADGTB
ADXPE
AEISY
AEOZL
AEPSL
AEYOC
AFBWG
AFION
AFKVX
AFRVT
AGDLA
AGMYJ
AGVKY
AGWUF
AHDZW
AIJEM
AIYEW
AJWEG
AKBVH
AKOOK
ALMA_UNASSIGNED_HOLDINGS
ALQZU
ALRRR
AQRUH
AQTUD
AVBZW
AWYRJ
BLEHA
BWMZZ
CAG
CCCUG
CE4
COF
CS3
CYRSC
DAOYK
DGEBU
DKSSO
EBS
EJD
E~A
E~B
GTTXZ
H13
HZ~
H~P
J.P
KYCEM
M4Z
NA5
NX~
O9-
OPCYK
PQQKQ
RNANH
ROSJB
RTWRZ
S-T
SNACF
TASJS
TBQAZ
TDBHL
TEN
TFL
TFT
TFW
TNC
TTHFI
TUROJ
TWF
UT5
UU3
ZGOLN
~S~
AAYXX
ADUMR
AGBKS
ARCSS
CITATION
HF~
IPNFZ
LJTGL
NUSFT
RIG
ADTPV
AOWAS
DF2
ID FETCH-LOGICAL-c334t-c9b7fd2f84ef27af2915e09f6ab20d8be7411ce9359e2d484034577b1223927c3
IEDL.DBID TFW
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000212948100003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1744-5760
1744-5779
IngestDate Tue Nov 04 16:53:03 EST 2025
Sat Nov 29 02:32:35 EST 2025
Mon May 13 12:09:12 EDT 2019
Mon Oct 20 23:46:24 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c334t-c9b7fd2f84ef27af2915e09f6ab20d8be7411ce9359e2d484034577b1223927c3
PageCount 19
ParticipantIDs informaworld_taylorfrancis_310_1080_17445760600568139
crossref_primary_10_1080_17445760600568139
swepub_primary_oai_DiVA_org_uu_80937
PublicationCentury 2000
PublicationDate 2006-10-01
PublicationDateYYYYMMDD 2006-10-01
PublicationDate_xml – month: 10
  year: 2006
  text: 2006-10-01
  day: 01
PublicationDecade 2000
PublicationTitle International journal of parallel, emergent and distributed systems
PublicationYear 2006
Publisher Taylor & Francis Group
Publisher_xml – name: Taylor & Francis Group
References Vuduc R. (CIT0029) 2002
CIT0010
(CIT0001) 2003
CIT0011
Mark Bull J. (CIT0005) 2002
Dongarra J. (CIT0012) 2003
Bircsak J. (CIT0003) 2000; 8
CIT0014
CIT0016
CIT0015
CIT0019
Dongarra J.J. (CIT0013) 1998
Nikolopoulos D.S. (CIT0022) 2000; 8
CIT0021
CIT0023
Haveraaen M. (CIT0017) 2001
Laudon J. (CIT0020) 1997
Barrett R. (CIT0002) 1994
Henrik Löf S.H. (CIT0018) 2004
CIT0024
CIT0027
CIT0004
CIT0007
CIT0006
CIT0028
CIT0009
CIT0008
References_xml – ident: CIT0021
  doi: 10.1145/1088149.1088201
– ident: CIT0011
– volume: 8
  start-page: 163
  year: 2000
  ident: CIT0003
  publication-title: Scientific Programming
  doi: 10.1155/2000/464182
– ident: CIT0028
  doi: 10.1017/CBO9780511615115
– year: 1994
  ident: CIT0002
  publication-title: SIAM
– ident: CIT0027
  doi: 10.1147/rd.416.0711
– ident: CIT0006
– volume-title: Proceedings of the Fourth European Workshop on OpenMP
  year: 2002
  ident: CIT0005
– ident: CIT0007
  doi: 10.1109/40.653032
– ident: CIT0009
  doi: 10.1016/0377-0427(89)90045-9
– ident: CIT0016
  doi: 10.1137/1031003
– ident: CIT0024
  doi: 10.1145/331532.331562
– volume-title: Proceedings of the Fifth European Workshop on OpenMP (EWOMP2005)
  year: 2003
  ident: CIT0001
– ident: CIT0008
  doi: 10.1145/582034.582041
– start-page: 241
  volume-title: Proceedings of the 24th Annual International Symposium on Computer Architecture
  year: 1997
  ident: CIT0020
  doi: 10.1145/264107.264206
– volume-title: Sourcebook of Parallel Computing
  year: 2003
  ident: CIT0012
– year: 1998
  ident: CIT0013
  publication-title: SIAM
– ident: CIT0015
  doi: 10.1137/0713023
– ident: CIT0014
– ident: CIT0019
  doi: 10.1137/S1064827595287997
– ident: CIT0023
  doi: 10.1137/S00361445003820
– start-page: 1
  volume-title: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing
  year: 2002
  ident: CIT0029
– ident: CIT0010
  doi: 10.1145/800195.805928
– ident: CIT0004
  doi: 10.1145/318789.318816
– start-page: 9
  volume-title: Computational Science—ICCS 2004: 4th International Conference, Kraków, Poland, June 6–9, 2004, Proceedings, Part II
  year: 2004
  ident: CIT0018
– volume: 8
  start-page: 143
  year: 2000
  ident: CIT0022
  publication-title: Scientific Programming
  doi: 10.1155/2000/417570
– volume-title: Norsk Informatikkonferanse (NIK 2001)
  year: 2001
  ident: CIT0017
SSID ssj0042467
Score 1.6495614
Snippet OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming...
SourceID swepub
crossref
informaworld
SourceType Open Access Repository
Index Database
Enrichment Source
Publisher
StartPage 345
SubjectTerms Bandwidth minimization
Conjugate gradients
Iterative solvers
OpenMP
Reversed Cuthill-McKee
Shared memory programming
Title Algorithmic optimizations of a conjugate gradient solver on shared memory architectures
URI https://www.tandfonline.com/doi/abs/10.1080/17445760600568139
https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-80937
Volume 21
WOSCitedRecordID wos000212948100003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAWR
  databaseName: Taylor & Francis Journals Complete
  customDbUrl:
  eissn: 1744-5779
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0042467
  issn: 1744-5779
  databaseCode: TFW
  dateStart: 20050301
  isFulltext: true
  titleUrlDefault: https://www.tandfonline.com
  providerName: Taylor & Francis
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELZQxcBCeYrykodOSBFJHMfJWAEVA6oYCu0WOYndBtEE5YHg33N2EtRS1AF223Hs891n-_N3CPVJCFGWM2pAL01DqREaHhHU8Lh0CeGR5Uo90w9sNPKmU_-x4eYUDa1S7aFlLRShfbVa3DwsWkbcNYBoB2Cy6SodSw8gDHhgQPXKvsfDSeuHHdvR-WNVaUMVb-80f2thJSqtaJb-kBDVYWfY_WeH99BugzfxoDaQfbQl0gPUbXM54GZpH6LJ4HWW5Uk5XyQRzsCPLNoHmjiTmGPYNr9U6sgNz3LNEisxWC2sA5yluJgrGjteKNbuJ16-myiO0NPwbnxzbzRJF4yIEKc0Ij9kMral5whpMy5t36LC9KXLQ9uMvVAABLEioR70Cjt2YH9I4N9YaAHO8G0WkWPUSbNUnCDMKBWcUjemUEkyzn3qxJHDiBVKE2JzD121gx681doagdVIlq4NWQ-Zy9MSlPpAQ9bZR9aLB-VH2UN0QxWy4VP9esq_e6U0uG-T50GQ5bOgqgLPBFR3-sfmz9COPsfRjMBz1CnzSlyg7ei9TIr8UtvwF1z78EQ
linkProvider Taylor & Francis
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLZ4SXBhPMV45sAJqaJtmqY9TsAEYkwcxuNWpW0yhtiKug7Bv8dJW8QAcYB7krqJYzvOl88AhzRGLys4s1BK29JshFZAJbMCoXxKReL4yqx0h3e7wf19eF0l3MYVrFKfoVVJFGFstd7cOhldQ-KOMYr2ME62fU1kGWAMMwvzDP2shvT12ne1JfZcz1SQ1c0t3b6-1fxpiCm_NMVa-oVE1DieduO_Iq_AchVyklapI6swI0dr0KjLOZBqd6_DXeupn-WD4mE4SEiGpmRYv9EkmSKC4Mn5caKzbqSfG6BYQVBxcSuQbETGDxrJToYauPtGPl9PjDfgpn3WOzm3qroLVkKpV1hJGHOVuirwpHK5UG7oMGmHyhexa6dBLDEKcRKp3_RKN_XwiEjx33jsYKgRujyhmzA3ykZyCwhnTArG_JRhJ8WFCJmXJh6nTqxsdM9NOKpnPXou6TUip2It_TZlTbA_r0tUmJyGKguQfG8eFa9FE9gvXegvnzos1_xDKk3DfTq4bUVZ3o8mkyiwUeG2_zj8ASye9646Ueeie7kDSyatYwCCuzBX5BO5BwvJSzEY5_tGod8Bdjr0ZQ
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwED7xEmLhjShPD52QIpI4jpOxAioQqGIA2i1yErstog1KUwT_nrOTIKCIAXbbcXxn32f783cATRpjlBWcWdhL29JqhFZAJbMCoXxKReL4ylj6hnc6Qa8X3lbcnElFq9R7aFUKRZi1Wk_u51TVjLhTBNEewmTb1zqWAUKYeVg0wljoznftbr0Qe65nEsjq4pYuX19q_tTEl7D0RbT0m4aoiTvttX_2eB1WK8BJWqWHbMCcHG_CWp3MgVRzewu6rad-lg-LwWiYkAwXklH9QpNkigiC--bHqT5zI_3c0MQKgm6LE4FkYzIZaB47GWna7hv5fDkx2Yb79sXd2aVVZV2wEkq9wkrCmKvUVYEnlcuFckOHSTtUvohdOw1iiRjESaR-0Svd1MMNIsV_47GDQCN0eUJ3YGGcjeUuEM6YFIz5KcNKigsRMi9NPE6dWNkYnBtwUg969FyKa0ROpVk6M2QNsD-bJSrMiYYq04_MFo-K16IB7Jcq9JdPNUuTf_RKi3CfDx9aUZb3o-k0CmyEdXt_bP4Ylm_P29HNVed6H1bMmY5hBx7AQpFP5SEsJS_FcJIfGXd-B3u68wk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Algorithmic+optimizations+of+a+conjugate+gradient+solver+on+shared+memory+architectures&rft.jtitle=International+journal+of+parallel%2C+emergent+and+distributed+systems&rft.au=L%C3%B6f%2C+Henrik&rft.au=Rantakokko%2C+Jarmo&rft.date=2006-10-01&rft.pub=Taylor+%26+Francis+Group&rft.issn=1744-5760&rft.eissn=1744-5779&rft.volume=21&rft.issue=5&rft.spage=345&rft.epage=363&rft_id=info:doi/10.1080%2F17445760600568139&rft.externalDocID=156796
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1744-5760&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1744-5760&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1744-5760&client=summon