Algorithmic optimizations of a conjugate gradient solver on shared memory architectures
OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is es...
Gespeichert in:
| Veröffentlicht in: | International journal of parallel, emergent and distributed systems Jg. 21; H. 5; S. 345 - 363 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Taylor & Francis Group
01.10.2006
|
| Schlagworte: | |
| ISSN: | 1744-5760, 1744-5779, 1744-5779 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems.
We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck. |
|---|---|
| AbstractList | OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming abstractions. Unfortunately, the architecture-independent abstractions sometimes come with the price of low parallel performance. This is especially true for applications with an unstructured data access pattern running on distributed shared memory systems (DSM). Here, proper data distribution and algorithmic optimizations play a vital role for performance. In this article, we have investigated ways of improving the performance of an industrial class conjugate gradient (CG) solver, implemented in OpenMP running on two types of shared memory systems.
We have evaluated bandwidth minimization, graph partitioning and reformulations of the original algorithm reducing global barriers. By a detailed analysis of barrier time and memory system performance, we found that bandwidth minimization is the most important optimization reducing both L2 misses and remote memory accesses. On a uniform memory system, we get perfect scaling. On a NUMA system, the performance is significantly improved with the algorithmic optimizations leaving the system dependent global reduction operations as a bottleneck. |
| Author | Löf, Henrik Rantakokko, Jarmo |
| Author_xml | – sequence: 1 givenname: Henrik surname: Löf fullname: Löf, Henrik email: henrik.lof@it.uu.se organization: Uppsala University, Department of Information Technology – sequence: 2 givenname: Jarmo surname: Rantakokko fullname: Rantakokko, Jarmo organization: Uppsala University, Department of Information Technology |
| BackLink | https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-80937$$DView record from Swedish Publication Index (Uppsala universitet) |
| BookMark | eNqFkMtKAzEUhoNUsK0-gLvspZrbTCbgptQrFNx4WYZMJpmmzExKkrHWp7el0k1BV-fn8H0Hzj8Cg853BoBLjK4xKtAN5oxlPEc5QlleYCpOwHC3m2Sci8Eh5-gMjGJcIsQIy_kQfEyb2geXFq3T0K-Sa923Ss53EXoLFdS-W_a1SgbWQVXOdAlG33yaAH0H40IFU8HWtD5soAp64ZLRqQ8mnoNTq5poLn7nGLw93L_Onibzl8fn2XQ-0ZSyNNGi5LYitmDGEq4sETgzSNhclQRVRWk4w1gbQTNhSMUKhuj2TV5iQqggXNMxuNrfjWuz6ku5Cq5VYSO9cvLOvU-lD7Xse1kgQfmWxntaBx9jMPbAYyR3PcqjHrfO7d5xnfWhVWsfmkomtWl8sEF12kVJ_9L5v_qRJdNXoj-zx45W |
| Cites_doi | 10.1145/1088149.1088201 10.1155/2000/464182 10.1017/CBO9780511615115 10.1147/rd.416.0711 10.1109/40.653032 10.1016/0377-0427(89)90045-9 10.1137/1031003 10.1145/331532.331562 10.1145/582034.582041 10.1145/264107.264206 10.1137/0713023 10.1137/S1064827595287997 10.1137/S00361445003820 10.1145/800195.805928 10.1145/318789.318816 10.1155/2000/417570 |
| ContentType | Journal Article |
| Copyright | Copyright Taylor & Francis Group, LLC 2006 |
| Copyright_xml | – notice: Copyright Taylor & Francis Group, LLC 2006 |
| DBID | AAYXX CITATION ADTPV AOWAS DF2 |
| DOI | 10.1080/17445760600568139 |
| DatabaseName | CrossRef SwePub SwePub Articles SWEPUB Uppsala universitet |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1744-5779 |
| EndPage | 363 |
| ExternalDocumentID | oai_DiVA_org_uu_80937 10_1080_17445760600568139 156796 |
| GroupedDBID | .7F .QJ 0BK 0R~ 29J 30N 4.4 5GY 5VS AAENE AAGDL AAHIA AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABFIM ABHAV ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACTIO ACTTO ADCVX ADGTB ADXPE AEISY AEOZL AEPSL AEYOC AFBWG AFION AFKVX AFRVT AGDLA AGMYJ AGVKY AGWUF AHDZW AIJEM AIYEW AJWEG AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU ALRRR AQRUH AQTUD AVBZW AWYRJ BLEHA BWMZZ CAG CCCUG CE4 COF CS3 CYRSC DAOYK DGEBU DKSSO EBS EJD E~A E~B GTTXZ H13 HZ~ H~P J.P KYCEM M4Z NA5 NX~ O9- OPCYK PQQKQ RNANH ROSJB RTWRZ S-T SNACF TASJS TBQAZ TDBHL TEN TFL TFT TFW TNC TTHFI TUROJ TWF UT5 UU3 ZGOLN ~S~ AAYXX ADUMR AGBKS ARCSS CITATION HF~ IPNFZ LJTGL NUSFT RIG ADTPV AOWAS DF2 |
| ID | FETCH-LOGICAL-c334t-c9b7fd2f84ef27af2915e09f6ab20d8be7411ce9359e2d484034577b1223927c3 |
| IEDL.DBID | TFW |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000212948100003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1744-5760 1744-5779 |
| IngestDate | Tue Nov 04 16:53:03 EST 2025 Sat Nov 29 02:32:35 EST 2025 Mon May 13 12:09:12 EDT 2019 Mon Oct 20 23:46:24 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c334t-c9b7fd2f84ef27af2915e09f6ab20d8be7411ce9359e2d484034577b1223927c3 |
| PageCount | 19 |
| ParticipantIDs | informaworld_taylorfrancis_310_1080_17445760600568139 crossref_primary_10_1080_17445760600568139 swepub_primary_oai_DiVA_org_uu_80937 |
| PublicationCentury | 2000 |
| PublicationDate | 2006-10-01 |
| PublicationDateYYYYMMDD | 2006-10-01 |
| PublicationDate_xml | – month: 10 year: 2006 text: 2006-10-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationTitle | International journal of parallel, emergent and distributed systems |
| PublicationYear | 2006 |
| Publisher | Taylor & Francis Group |
| Publisher_xml | – name: Taylor & Francis Group |
| References | Vuduc R. (CIT0029) 2002 CIT0010 (CIT0001) 2003 CIT0011 Mark Bull J. (CIT0005) 2002 Dongarra J. (CIT0012) 2003 Bircsak J. (CIT0003) 2000; 8 CIT0014 CIT0016 CIT0015 CIT0019 Dongarra J.J. (CIT0013) 1998 Nikolopoulos D.S. (CIT0022) 2000; 8 CIT0021 CIT0023 Haveraaen M. (CIT0017) 2001 Laudon J. (CIT0020) 1997 Barrett R. (CIT0002) 1994 Henrik Löf S.H. (CIT0018) 2004 CIT0024 CIT0027 CIT0004 CIT0007 CIT0006 CIT0028 CIT0009 CIT0008 |
| References_xml | – ident: CIT0021 doi: 10.1145/1088149.1088201 – ident: CIT0011 – volume: 8 start-page: 163 year: 2000 ident: CIT0003 publication-title: Scientific Programming doi: 10.1155/2000/464182 – ident: CIT0028 doi: 10.1017/CBO9780511615115 – year: 1994 ident: CIT0002 publication-title: SIAM – ident: CIT0027 doi: 10.1147/rd.416.0711 – ident: CIT0006 – volume-title: Proceedings of the Fourth European Workshop on OpenMP year: 2002 ident: CIT0005 – ident: CIT0007 doi: 10.1109/40.653032 – ident: CIT0009 doi: 10.1016/0377-0427(89)90045-9 – ident: CIT0016 doi: 10.1137/1031003 – ident: CIT0024 doi: 10.1145/331532.331562 – volume-title: Proceedings of the Fifth European Workshop on OpenMP (EWOMP2005) year: 2003 ident: CIT0001 – ident: CIT0008 doi: 10.1145/582034.582041 – start-page: 241 volume-title: Proceedings of the 24th Annual International Symposium on Computer Architecture year: 1997 ident: CIT0020 doi: 10.1145/264107.264206 – volume-title: Sourcebook of Parallel Computing year: 2003 ident: CIT0012 – year: 1998 ident: CIT0013 publication-title: SIAM – ident: CIT0015 doi: 10.1137/0713023 – ident: CIT0014 – ident: CIT0019 doi: 10.1137/S1064827595287997 – ident: CIT0023 doi: 10.1137/S00361445003820 – start-page: 1 volume-title: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing year: 2002 ident: CIT0029 – ident: CIT0010 doi: 10.1145/800195.805928 – ident: CIT0004 doi: 10.1145/318789.318816 – start-page: 9 volume-title: Computational Science—ICCS 2004: 4th International Conference, Kraków, Poland, June 6–9, 2004, Proceedings, Part II year: 2004 ident: CIT0018 – volume: 8 start-page: 143 year: 2000 ident: CIT0022 publication-title: Scientific Programming doi: 10.1155/2000/417570 – volume-title: Norsk Informatikkonferanse (NIK 2001) year: 2001 ident: CIT0017 |
| SSID | ssj0042467 |
| Score | 1.6495614 |
| Snippet | OpenMP is an architecture-independent language for programming in the shared memory model. OpenMP is designed to be simple and powerful in terms of programming... |
| SourceID | swepub crossref informaworld |
| SourceType | Open Access Repository Index Database Enrichment Source Publisher |
| StartPage | 345 |
| SubjectTerms | Bandwidth minimization Conjugate gradients Iterative solvers OpenMP Reversed Cuthill-McKee Shared memory programming |
| Title | Algorithmic optimizations of a conjugate gradient solver on shared memory architectures |
| URI | https://www.tandfonline.com/doi/abs/10.1080/17445760600568139 https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-80937 |
| Volume | 21 |
| WOSCitedRecordID | wos000212948100003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAWR databaseName: Taylor & Francis Journals Complete customDbUrl: eissn: 1744-5779 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0042467 issn: 1744-5779 databaseCode: TFW dateStart: 20050301 isFulltext: true titleUrlDefault: https://www.tandfonline.com providerName: Taylor & Francis |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELZQxcBCeYrykodOSBFJHMfJWAEVA6oYCu0WOYndBtEE5YHg33N2EtRS1AF223Hs891n-_N3CPVJCFGWM2pAL01DqREaHhHU8Lh0CeGR5Uo90w9sNPKmU_-x4eYUDa1S7aFlLRShfbVa3DwsWkbcNYBoB2Cy6SodSw8gDHhgQPXKvsfDSeuHHdvR-WNVaUMVb-80f2thJSqtaJb-kBDVYWfY_WeH99BugzfxoDaQfbQl0gPUbXM54GZpH6LJ4HWW5Uk5XyQRzsCPLNoHmjiTmGPYNr9U6sgNz3LNEisxWC2sA5yluJgrGjteKNbuJ16-myiO0NPwbnxzbzRJF4yIEKc0Ij9kMral5whpMy5t36LC9KXLQ9uMvVAABLEioR70Cjt2YH9I4N9YaAHO8G0WkWPUSbNUnCDMKBWcUjemUEkyzn3qxJHDiBVKE2JzD121gx681doagdVIlq4NWQ-Zy9MSlPpAQ9bZR9aLB-VH2UN0QxWy4VP9esq_e6U0uG-T50GQ5bOgqgLPBFR3-sfmz9COPsfRjMBz1CnzSlyg7ei9TIr8UtvwF1z78EQ |
| linkProvider | Taylor & Francis |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDLZ4SXBhPMV45sAJqaJtmqY9TsAEYkwcxuNWpW0yhtiKug7Bv8dJW8QAcYB7krqJYzvOl88AhzRGLys4s1BK29JshFZAJbMCoXxKReL4yqx0h3e7wf19eF0l3MYVrFKfoVVJFGFstd7cOhldQ-KOMYr2ME62fU1kGWAMMwvzDP2shvT12ne1JfZcz1SQ1c0t3b6-1fxpiCm_NMVa-oVE1DieduO_Iq_AchVyklapI6swI0dr0KjLOZBqd6_DXeupn-WD4mE4SEiGpmRYv9EkmSKC4Mn5caKzbqSfG6BYQVBxcSuQbETGDxrJToYauPtGPl9PjDfgpn3WOzm3qroLVkKpV1hJGHOVuirwpHK5UG7oMGmHyhexa6dBLDEKcRKp3_RKN_XwiEjx33jsYKgRujyhmzA3ykZyCwhnTArG_JRhJ8WFCJmXJh6nTqxsdM9NOKpnPXou6TUip2It_TZlTbA_r0tUmJyGKguQfG8eFa9FE9gvXegvnzos1_xDKk3DfTq4bUVZ3o8mkyiwUeG2_zj8ASye9646Ueeie7kDSyatYwCCuzBX5BO5BwvJSzEY5_tGod8Bdjr0ZQ |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwED7xEmLhjShPD52QIpI4jpOxAioQqGIA2i1yErstog1KUwT_nrOTIKCIAXbbcXxn32f783cATRpjlBWcWdhL29JqhFZAJbMCoXxKReL4ylj6hnc6Qa8X3lbcnElFq9R7aFUKRZi1Wk_u51TVjLhTBNEewmTb1zqWAUKYeVg0wljoznftbr0Qe65nEsjq4pYuX19q_tTEl7D0RbT0m4aoiTvttX_2eB1WK8BJWqWHbMCcHG_CWp3MgVRzewu6rad-lg-LwWiYkAwXklH9QpNkigiC--bHqT5zI_3c0MQKgm6LE4FkYzIZaB47GWna7hv5fDkx2Yb79sXd2aVVZV2wEkq9wkrCmKvUVYEnlcuFckOHSTtUvohdOw1iiRjESaR-0Svd1MMNIsV_47GDQCN0eUJ3YGGcjeUuEM6YFIz5KcNKigsRMi9NPE6dWNkYnBtwUg969FyKa0ROpVk6M2QNsD-bJSrMiYYq04_MFo-K16IB7Jcq9JdPNUuTf_RKi3CfDx9aUZb3o-k0CmyEdXt_bP4Ylm_P29HNVed6H1bMmY5hBx7AQpFP5SEsJS_FcJIfGXd-B3u68wk |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Algorithmic+optimizations+of+a+conjugate+gradient+solver+on+shared+memory+architectures&rft.jtitle=International+journal+of+parallel%2C+emergent+and+distributed+systems&rft.au=L%C3%B6f%2C+Henrik&rft.au=Rantakokko%2C+Jarmo&rft.date=2006-10-01&rft.pub=Taylor+%26+Francis+Group&rft.issn=1744-5760&rft.eissn=1744-5779&rft.volume=21&rft.issue=5&rft.spage=345&rft.epage=363&rft_id=info:doi/10.1080%2F17445760600568139&rft.externalDocID=156796 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1744-5760&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1744-5760&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1744-5760&client=summon |