Parallel Performance Evaluation and Optimization
This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...
Uložené v:
| Vydané v: | Programming multi‐core and many‐core computing systems s. 343 - 362 |
|---|---|
| Hlavný autor: | |
| Médium: | Kapitola |
| Jazyk: | English |
| Vydavateľské údaje: |
Hoboken, NJ, USA
John Wiley & Sons, Inc
24.01.2017
|
| Predmet: | |
| ISBN: | 0470936908, 9780470936900 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible. |
|---|---|
| AbstractList | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible. |
| Author | Shafi, Hazim |
| Author_xml | – sequence: 1 givenname: Hazim surname: Shafi fullname: Shafi, Hazim organization: One Microsoft Way |
| BookMark | eNptkNFKwzAYhSMq6GZfwKu-QOv_N0mTXMqYThhsF3odkibBYNaWdiru6aXTm4k358CB78A5M3LRdq0n5BahRIDqTgmJiIrSCpCXzSuKM5KdhOdkBkyAorUCeUWycYwWOFBkUrBrAlszmJR8yrd-CN2wM23j8-WHSe9mH7s2N63LN_0-7uLhGNyQy2DS6LNfn5OXh-XzYlWsN49Pi_t1MSKnoXAAjAuseDACbM19AGsFsw1aXktVSSVqZyXFBqhH3zBXobGgHFh0AgOdk_Kn9zMm_6W97bq3UZ9M04fY62mz7t0E0H8ABD0d9Qc8QpPQbwyNXAU |
| ContentType | Book Chapter |
| Copyright | Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved |
| Copyright_xml | – notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved |
| DOI | 10.1002/9781119332015.ch17 |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781119332015 111933201X |
| Editor | Xhafa, Fatos Pllana, Sabri |
| Editor_xml | – sequence: 1 givenname: Sabri surname: Pllana fullname: Pllana, Sabri – sequence: 2 givenname: Fatos surname: Xhafa fullname: Xhafa, Fatos |
| EndPage | 362 |
| ExternalDocumentID | 10.1002/9781119332015.ch17 |
| Genre | chapter |
| GroupedDBID | 38. 3XM AABBV ABARN ABQPQ ABQPW ACHMX ACLGV ADVEM AERYV AFOJC AFPKT AHWGJ AJFER ALMA_UNASSIGNED_HOLDINGS AZZ BBABE BIBOL CZZ DFSMB DMGWJ DPMII ERSLE GEOUK IPJKO JFSCD LQKAK LWYJN LYPXV MUFYN PQQKQ W1A WIIVT YPLAZ ZEEST |
| ID | FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3 |
| ISBN | 0470936908 9780470936900 |
| IngestDate | Sat Nov 15 22:27:50 EST 2025 Wed Nov 27 04:53:37 EST 2019 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3 |
| PageCount | 20 |
| ParticipantIDs | wiley_ebooks_10_1002_9781119332015_ch17_ch17 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-01-24 |
| PublicationDateYYYYMMDD | 2017-01-24 |
| PublicationDate_xml | – month: 01 year: 2017 text: 2017-01-24 day: 24 |
| PublicationDecade | 2010 |
| PublicationPlace | Hoboken, NJ, USA |
| PublicationPlace_xml | – name: Hoboken, NJ, USA |
| PublicationTitle | Programming multi‐core and many‐core computing systems |
| PublicationYear | 2017 |
| Publisher | John Wiley & Sons, Inc |
| Publisher_xml | – name: John Wiley & Sons, Inc |
| References | Chen, Baer (c17-cit-0004) 1992 (c17-cit-0007) 2011 Reinders (c17-cit-0010) 2005 Mowry (c17-cit-0008) 1994 Shafi (c17-cit-0011) 2010 c17-cit-0001 Culler, Pal Singh, Gupta (c17-cit-0005) 1999 Park, Buch (c17-cit-0009) 2007 Amdahl (c17-cit-0003) 1967 Adve, Gharachorloo (c17-cit-0002) 1996; 29 Mellor‐Crummey, Scott (c17-cit-0006) 1991; 9 |
| References_xml | – volume: 9 start-page: 21 issue: 1 year: 1991 end-page: 65 ident: c17-cit-0006 article-title: Algorithms for scalable Synchronization on shared‐memory multiprocessors publication-title: ACM Transactions on Computer Systems – ident: c17-cit-0001 article-title: AMD CodeAnalyst Performance Analyzer – volume: 29 start-page: 66 issue: 12 year: 1996 end-page: 76 ident: c17-cit-0002 article-title: Shared memory consistency models: a tutorial publication-title: IEEE Computer – start-page: 51 year: 1992 end-page: 61 ident: c17-cit-0004 article-title: Reducing memory latency via non‐blocking and prefetching caches publication-title: Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems – year: 2011 ident: c17-cit-0007 – year: 2007 ident: c17-cit-0009 article-title: Improve debugging and performance tuning with ETW publication-title: MSDN Magazine – year: 2005 ident: c17-cit-0010 article-title: VTune Performance Analyzer Essentials: Measurement and Tuning Techniques for Software Developers – start-page: 483 year: 1967 end-page: 485 ident: c17-cit-0003 article-title: Validity of the single processor approach to achieving large‐scale computing capabilities publication-title: AFIPS Conference Proceedings – year: 1994 ident: c17-cit-0008 article-title: Tolerating Latency through Software‐Controlled Data Prefetching – year: 1999 ident: c17-cit-0005 article-title: Parallel Computer Architecture: A Hardware/Software Approach – year: 2010 ident: c17-cit-0011 article-title: Performance tuning with the concurrency visualizer in visual studio 2010 publication-title: MSDN Magazine |
| SSID | ssib050314874 ssib027811140 ssj0001756349 |
| Score | 1.50013 |
| Snippet | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in... |
| SourceID | wiley |
| SourceType | Enrichment Source Publisher |
| StartPage | 343 |
| SubjectTerms | cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming |
| Title | Parallel Performance Evaluation and Optimization |
| URI | https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8QwEA7r4yAefOObHgQvVpMmbZqTB1kVBPWg4G1pXrjg1mVXRfz1TpJ2t1URPHgJJUybpl86nUxnvkHoIFPa6lTpuEhSGjNjWSysSmPLuADr1AgsrC82wa-v84cHcdvpnNa5MG9PvCzz93cx_FeooQ_Adqmzf4B7clHogGMAHVqAHdovFnHb9xqYMEK41cA5AHys4CSYwdFV-j8FA3j9273KV3bwjoUGf7nnbSxsqGpdfPQHzdV1W4xcEZYnF0E_STzoTpjD_UA3oI0GVZpn07lAXBRqnEydiz9E77RDNMM2FDOOXWFAjBuqkAb6peqrSoPO_aawAwGsI94iYEpSuIf0WD2GdM4vRNi_ic-gGc5Bzc1ddG_ur2qFkni5KZlN6rj688re9U44nmaUCc9UUE0hr1mZ6ilVmVYw8sn3cdtbGm-T3C2jRZenErkEErj_FdQx5Spaqqt0RJXSXkO4xipqYBVNsYoAq6iJ1Tq6P-_enV3GVZ2MeAzfKxtrZ5dzsFRtwbHMUmOxlJxJRWSa5QK21DzTMqdEYWqIUUwnpJBYaCyJ5sTSDTRbPpdmE0XKMF0YJjRRkhmuHGmsETllBYEzZb6Fjvx8e_5X_rgXSK-TXuvJ9NyT8c0WOmyJt8U--sMgOtR2-28X3kEL0-W6i2ZfRq9mD82rt5f-eLRfLYNPhWFYPA |
| linkProvider | ProQuest Ebooks |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true |

