Parallel Performance Evaluation and Optimization

This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Programming multi‐core and many‐core computing systems s. 343 - 362
Hlavný autor:	Shafi, Hazim
Médium:	Kapitola
Jazyk:	English
Vydavateľské údaje:	Hoboken, NJ, USA John Wiley & Sons, Inc 24.01.2017
Predmet:	cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming
ISBN:	0470936908, 9780470936900
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
AbstractList	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
Author	Shafi, Hazim
Author_xml	– sequence: 1 givenname: Hazim surname: Shafi fullname: Shafi, Hazim organization: One Microsoft Way
BookMark	eNptkNFKwzAYhSMq6GZfwKu-QOv_N0mTXMqYThhsF3odkibBYNaWdiru6aXTm4k358CB78A5M3LRdq0n5BahRIDqTgmJiIrSCpCXzSuKM5KdhOdkBkyAorUCeUWycYwWOFBkUrBrAlszmJR8yrd-CN2wM23j8-WHSe9mH7s2N63LN_0-7uLhGNyQy2DS6LNfn5OXh-XzYlWsN49Pi_t1MSKnoXAAjAuseDACbM19AGsFsw1aXktVSSVqZyXFBqhH3zBXobGgHFh0AgOdk_Kn9zMm_6W97bq3UZ9M04fY62mz7t0E0H8ABD0d9Qc8QpPQbwyNXAU
ContentType	Book Chapter
Copyright	Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
Copyright_xml	– notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
DOI	10.1002/9781119332015.ch17
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781119332015 111933201X
Editor	Xhafa, Fatos Pllana, Sabri
Editor_xml	– sequence: 1 givenname: Sabri surname: Pllana fullname: Pllana, Sabri – sequence: 2 givenname: Fatos surname: Xhafa fullname: Xhafa, Fatos
EndPage	362
ExternalDocumentID	10.1002/9781119332015.ch17
Genre	chapter
GroupedDBID	38. 3XM AABBV ABARN ABQPQ ABQPW ACHMX ACLGV ADVEM AERYV AFOJC AFPKT AHWGJ AJFER ALMA_UNASSIGNED_HOLDINGS AZZ BBABE BIBOL CZZ DFSMB DMGWJ DPMII ERSLE GEOUK IPJKO JFSCD LQKAK LWYJN LYPXV MUFYN PQQKQ W1A WIIVT YPLAZ ZEEST
ID	FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
ISBN	0470936908 9780470936900
IngestDate	Sat Nov 15 22:27:50 EST 2025 Wed Nov 27 04:53:37 EST 2019
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
PageCount	20
ParticipantIDs	wiley_ebooks_10_1002_9781119332015_ch17_ch17
PublicationCentury	2000
PublicationDate	2017-01-24
PublicationDateYYYYMMDD	2017-01-24
PublicationDate_xml	– month: 01 year: 2017 text: 2017-01-24 day: 24
PublicationDecade	2010
PublicationPlace	Hoboken, NJ, USA
PublicationPlace_xml	– name: Hoboken, NJ, USA
PublicationTitle	Programming multi‐core and many‐core computing systems
PublicationYear	2017
Publisher	John Wiley & Sons, Inc
Publisher_xml	– name: John Wiley & Sons, Inc
References	Chen, Baer (c17-cit-0004) 1992 (c17-cit-0007) 2011 Reinders (c17-cit-0010) 2005 Mowry (c17-cit-0008) 1994 Shafi (c17-cit-0011) 2010 c17-cit-0001 Culler, Pal Singh, Gupta (c17-cit-0005) 1999 Park, Buch (c17-cit-0009) 2007 Amdahl (c17-cit-0003) 1967 Adve, Gharachorloo (c17-cit-0002) 1996; 29 Mellor‐Crummey, Scott (c17-cit-0006) 1991; 9
References_xml	– volume: 9 start-page: 21 issue: 1 year: 1991 end-page: 65 ident: c17-cit-0006 article-title: Algorithms for scalable Synchronization on shared‐memory multiprocessors publication-title: ACM Transactions on Computer Systems – ident: c17-cit-0001 article-title: AMD CodeAnalyst Performance Analyzer – volume: 29 start-page: 66 issue: 12 year: 1996 end-page: 76 ident: c17-cit-0002 article-title: Shared memory consistency models: a tutorial publication-title: IEEE Computer – start-page: 51 year: 1992 end-page: 61 ident: c17-cit-0004 article-title: Reducing memory latency via non‐blocking and prefetching caches publication-title: Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems – year: 2011 ident: c17-cit-0007 – year: 2007 ident: c17-cit-0009 article-title: Improve debugging and performance tuning with ETW publication-title: MSDN Magazine – year: 2005 ident: c17-cit-0010 article-title: VTune Performance Analyzer Essentials: Measurement and Tuning Techniques for Software Developers – start-page: 483 year: 1967 end-page: 485 ident: c17-cit-0003 article-title: Validity of the single processor approach to achieving large‐scale computing capabilities publication-title: AFIPS Conference Proceedings – year: 1994 ident: c17-cit-0008 article-title: Tolerating Latency through Software‐Controlled Data Prefetching – year: 1999 ident: c17-cit-0005 article-title: Parallel Computer Architecture: A Hardware/Software Approach – year: 2010 ident: c17-cit-0011 article-title: Performance tuning with the concurrency visualizer in visual studio 2010 publication-title: MSDN Magazine
SSID	ssib050314874 ssib027811140 ssj0001756349
Score	1.50013
Snippet	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in...
SourceID	wiley
SourceType	Enrichment Source Publisher
StartPage	343
SubjectTerms	cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming
Title	Parallel Performance Evaluation and Optimization
URI	https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8QwEA7r4yAefOObHgQvVpMmbZqTB1kVBPWg4G1pXrjg1mVXRfz1TpJ2t1URPHgJJUybpl86nUxnvkHoIFPa6lTpuEhSGjNjWSysSmPLuADr1AgsrC82wa-v84cHcdvpnNa5MG9PvCzz93cx_FeooQ_Adqmzf4B7clHogGMAHVqAHdovFnHb9xqYMEK41cA5AHys4CSYwdFV-j8FA3j9273KV3bwjoUGf7nnbSxsqGpdfPQHzdV1W4xcEZYnF0E_STzoTpjD_UA3oI0GVZpn07lAXBRqnEydiz9E77RDNMM2FDOOXWFAjBuqkAb6peqrSoPO_aawAwGsI94iYEpSuIf0WD2GdM4vRNi_ic-gGc5Bzc1ddG_ur2qFkni5KZlN6rj688re9U44nmaUCc9UUE0hr1mZ6ilVmVYw8sn3cdtbGm-T3C2jRZenErkEErj_FdQx5Spaqqt0RJXSXkO4xipqYBVNsYoAq6iJ1Tq6P-_enV3GVZ2MeAzfKxtrZ5dzsFRtwbHMUmOxlJxJRWSa5QK21DzTMqdEYWqIUUwnpJBYaCyJ5sTSDTRbPpdmE0XKMF0YJjRRkhmuHGmsETllBYEzZb6Fjvx8e_5X_rgXSK-TXuvJ9NyT8c0WOmyJt8U--sMgOtR2-28X3kEL0-W6i2ZfRq9mD82rt5f-eLRfLYNPhWFYPA
linkProvider	ProQuest Ebooks
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true