Parallel Performance Evaluation and Optimization

This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Programming multi‐core and many‐core computing systems s. 343 - 362
Hlavný autor: Shafi, Hazim
Médium: Kapitola
Jazyk:English
Vydavateľské údaje: Hoboken, NJ, USA John Wiley & Sons, Inc 24.01.2017
Predmet:
ISBN:0470936908, 9780470936900
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
AbstractList This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
Author Shafi, Hazim
Author_xml – sequence: 1
  givenname: Hazim
  surname: Shafi
  fullname: Shafi, Hazim
  organization: One Microsoft Way
BookMark eNptkNFKwzAYhSMq6GZfwKu-QOv_N0mTXMqYThhsF3odkibBYNaWdiru6aXTm4k358CB78A5M3LRdq0n5BahRIDqTgmJiIrSCpCXzSuKM5KdhOdkBkyAorUCeUWycYwWOFBkUrBrAlszmJR8yrd-CN2wM23j8-WHSe9mH7s2N63LN_0-7uLhGNyQy2DS6LNfn5OXh-XzYlWsN49Pi_t1MSKnoXAAjAuseDACbM19AGsFsw1aXktVSSVqZyXFBqhH3zBXobGgHFh0AgOdk_Kn9zMm_6W97bq3UZ9M04fY62mz7t0E0H8ABD0d9Qc8QpPQbwyNXAU
ContentType Book Chapter
Copyright Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
Copyright_xml – notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
DOI 10.1002/9781119332015.ch17
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781119332015
111933201X
Editor Xhafa, Fatos
Pllana, Sabri
Editor_xml – sequence: 1
  givenname: Sabri
  surname: Pllana
  fullname: Pllana, Sabri
– sequence: 2
  givenname: Fatos
  surname: Xhafa
  fullname: Xhafa, Fatos
EndPage 362
ExternalDocumentID 10.1002/9781119332015.ch17
Genre chapter
GroupedDBID 38.
3XM
AABBV
ABARN
ABQPQ
ABQPW
ACHMX
ACLGV
ADVEM
AERYV
AFOJC
AFPKT
AHWGJ
AJFER
ALMA_UNASSIGNED_HOLDINGS
AZZ
BBABE
BIBOL
CZZ
DFSMB
DMGWJ
DPMII
ERSLE
GEOUK
IPJKO
JFSCD
LQKAK
LWYJN
LYPXV
MUFYN
PQQKQ
W1A
WIIVT
YPLAZ
ZEEST
ID FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
ISBN 0470936908
9780470936900
IngestDate Sat Nov 15 22:27:50 EST 2025
Wed Nov 27 04:53:37 EST 2019
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
PageCount 20
ParticipantIDs wiley_ebooks_10_1002_9781119332015_ch17_ch17
PublicationCentury 2000
PublicationDate 2017-01-24
PublicationDateYYYYMMDD 2017-01-24
PublicationDate_xml – month: 01
  year: 2017
  text: 2017-01-24
  day: 24
PublicationDecade 2010
PublicationPlace Hoboken, NJ, USA
PublicationPlace_xml – name: Hoboken, NJ, USA
PublicationTitle Programming multi‐core and many‐core computing systems
PublicationYear 2017
Publisher John Wiley & Sons, Inc
Publisher_xml – name: John Wiley & Sons, Inc
References Chen, Baer (c17-cit-0004) 1992
(c17-cit-0007) 2011
Reinders (c17-cit-0010) 2005
Mowry (c17-cit-0008) 1994
Shafi (c17-cit-0011) 2010
c17-cit-0001
Culler, Pal Singh, Gupta (c17-cit-0005) 1999
Park, Buch (c17-cit-0009) 2007
Amdahl (c17-cit-0003) 1967
Adve, Gharachorloo (c17-cit-0002) 1996; 29
Mellor‐Crummey, Scott (c17-cit-0006) 1991; 9
References_xml – volume: 9
  start-page: 21
  issue: 1
  year: 1991
  end-page: 65
  ident: c17-cit-0006
  article-title: Algorithms for scalable Synchronization on shared‐memory multiprocessors
  publication-title: ACM Transactions on Computer Systems
– ident: c17-cit-0001
  article-title: AMD CodeAnalyst Performance Analyzer
– volume: 29
  start-page: 66
  issue: 12
  year: 1996
  end-page: 76
  ident: c17-cit-0002
  article-title: Shared memory consistency models: a tutorial
  publication-title: IEEE Computer
– start-page: 51
  year: 1992
  end-page: 61
  ident: c17-cit-0004
  article-title: Reducing memory latency via non‐blocking and prefetching caches
  publication-title: Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
– year: 2011
  ident: c17-cit-0007
– year: 2007
  ident: c17-cit-0009
  article-title: Improve debugging and performance tuning with ETW
  publication-title: MSDN Magazine
– year: 2005
  ident: c17-cit-0010
  article-title: VTune Performance Analyzer Essentials: Measurement and Tuning Techniques for Software Developers
– start-page: 483
  year: 1967
  end-page: 485
  ident: c17-cit-0003
  article-title: Validity of the single processor approach to achieving large‐scale computing capabilities
  publication-title: AFIPS Conference Proceedings
– year: 1994
  ident: c17-cit-0008
  article-title: Tolerating Latency through Software‐Controlled Data Prefetching
– year: 1999
  ident: c17-cit-0005
  article-title: Parallel Computer Architecture: A Hardware/Software Approach
– year: 2010
  ident: c17-cit-0011
  article-title: Performance tuning with the concurrency visualizer in visual studio 2010
  publication-title: MSDN Magazine
SSID ssib050314874
ssib027811140
ssj0001756349
Score 1.50013
Snippet This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in...
SourceID wiley
SourceType Enrichment Source
Publisher
StartPage 343
SubjectTerms cache coherence
nonuniform memory access
overlapping latency
parallel performance optimization
parallel performance tuning
shared‐memory parallel programming
Title Parallel Performance Evaluation and Optimization
URI https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8QwEA7r4yAefOObHgQvVpMmbZqTB1kVBPWg4G1pXrjg1mVXRfz1TpJ2t1URPHgJJUybpl86nUxnvkHoIFPa6lTpuEhSGjNjWSysSmPLuADr1AgsrC82wa-v84cHcdvpnNa5MG9PvCzz93cx_FeooQ_Adqmzf4B7clHogGMAHVqAHdovFnHb9xqYMEK41cA5AHys4CSYwdFV-j8FA3j9273KV3bwjoUGf7nnbSxsqGpdfPQHzdV1W4xcEZYnF0E_STzoTpjD_UA3oI0GVZpn07lAXBRqnEydiz9E77RDNMM2FDOOXWFAjBuqkAb6peqrSoPO_aawAwGsI94iYEpSuIf0WD2GdM4vRNi_ic-gGc5Bzc1ddG_ur2qFkni5KZlN6rj688re9U44nmaUCc9UUE0hr1mZ6ilVmVYw8sn3cdtbGm-T3C2jRZenErkEErj_FdQx5Spaqqt0RJXSXkO4xipqYBVNsYoAq6iJ1Tq6P-_enV3GVZ2MeAzfKxtrZ5dzsFRtwbHMUmOxlJxJRWSa5QK21DzTMqdEYWqIUUwnpJBYaCyJ5sTSDTRbPpdmE0XKMF0YJjRRkhmuHGmsETllBYEzZb6Fjvx8e_5X_rgXSK-TXuvJ9NyT8c0WOmyJt8U--sMgOtR2-28X3kEL0-W6i2ZfRq9mD82rt5f-eLRfLYNPhWFYPA
linkProvider ProQuest Ebooks
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true