The limits and effectiveness of data prefetching on scalable multiprocessors

Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Performance evaluation Ročník 27; s. 209 - 229
Hlavní autoři: Mao, Weihua, Saavedra, Rafael H.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 1996
Témata:
ISSN:0166-5316, 1872-745X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting the performance of prefetching such as: program behavior, network topology, cache coherency protocols, memory consistency models, etc. We use execution-driven simulation to validate the predictions of the model with respect to overall speedup, average memory latency, and cache pollution. We show that the model provides accurate predictions for programs that do not saturate the bandwidth of the network. The model could be used by compilers and/or programmers to determine when to issue prefetch instructions in order to maximize the speedup that can be obtained from software-controlled prefetching.
AbstractList Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting the performance of prefetching such as: program behavior, network topology, cache coherency protocols, memory consistency models, etc. We use execution-driven simulation to validate the predictions of the model with respect to overall speedup, average memory latency, and cache pollution. We show that the model provides accurate predictions for programs that do not saturate the bandwidth of the network. The model could be used by compilers and/or programmers to determine when to issue prefetch instructions in order to maximize the speedup that can be obtained from software-controlled prefetching.
Author Saavedra, Rafael H.
Mao, Weihua
Author_xml – sequence: 1
  givenname: Weihua
  surname: Mao
  fullname: Mao, Weihua
– sequence: 2
  givenname: Rafael H.
  surname: Saavedra
  fullname: Saavedra, Rafael H.
  email: saavedra@cs.usc.edu
BookMark eNqFUE1LAzEUDFLBtvoThBz1sPqym93snkSKX1DwYAVvIZu82Mg2Kcla8N-bWvEqPOZdZoaZmZGJDx4JOWdwxYA11y8ZmqKuWHPRNZcdQNkWcESmrBVlIXj9NiHTP8oJmaX0AQC1qGBKlqs10sFt3Jio8oaitahHt0OPKdFgqVGjotuIFke9dv6dBk-TVoPqB6Sbz2F02xh0JoeYTsmxVUPCs98_J6_3d6vFY7F8fnha3C4LzUQ5Fm2JxvRCd5yB5Z3tDBe1hhZN3-ecyFusGga8srzmXCBwbnipRMNEy63uqzmpD746hpRyNrmNbqPil2Qg95PIn0nkvq_s8u0nkZB1Nwcd5nA7h1Em7dBrNC7m1tIE94_DN8goayA
Cites_doi 10.1016/0743-7315(91)90014-Z
10.1006/jpdc.1994.1102
10.1145/130823.130824
10.1109/12.53599
10.1109/71.97897
10.1109/2.121510
10.1126/science.231.4741.967
ContentType Journal Article
Copyright 1996
Copyright_xml – notice: 1996
DBID AAYXX
CITATION
DOI 10.1016/S0166-5316(96)90028-0
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-745X
EndPage 229
ExternalDocumentID 10_1016_S0166_5316_96_90028_0
S0166531696900280
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29O
4.4
457
4G.
5VS
6OB
7-5
71M
8P~
9JN
9JO
AAAKF
AAAKG
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
AAYFN
ABAOU
ABBOA
ABFNM
ABJNI
ABMAC
ABTAH
ABUCO
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADEZE
ADGUI
ADJOM
ADMUD
AEBSH
AEKER
AENEX
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ARUGR
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HAMUX
HLZ
HVGLF
HX~
HZ~
IHE
J1W
KOM
LG9
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSD
SSV
SSW
SSZ
T5K
TN5
WUQ
ZY4
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c172t-82eddb7c9410f49f9d475c08edbb016e48e361043f45447e044d42a761784fcb3
ISSN 0166-5316
IngestDate Sat Nov 29 01:43:59 EST 2025
Fri Feb 23 02:31:01 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Execution-driven simulation
Software data prefetching
Analytical modelling
Distributed shared-memory multiprocessors
Language English
License https://www.elsevier.com/tdm/userlicense/1.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c172t-82eddb7c9410f49f9d475c08edbb016e48e361043f45447e044d42a761784fcb3
PageCount 21
ParticipantIDs crossref_primary_10_1016_S0166_5316_96_90028_0
elsevier_sciencedirect_doi_10_1016_S0166_5316_96_90028_0
PublicationCentury 1900
PublicationDate 1996-00-00
PublicationDateYYYYMMDD 1996-01-01
PublicationDate_xml – year: 1996
  text: 1996-00-00
PublicationDecade 1990
PublicationTitle Performance evaluation
PublicationYear 1996
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Gornish, Craston, Veidenbaum (BIB8) 1990
Mowry, Gupta (BIB18) 1991; 12
Kroft (BIB11) 1981
Gupta (BIB9) 1991
Agarwal (BIB3) 1992; 3
Johnson (BIB10) 1992
Agarwal (BIB2) 1991; 2
Digital Equipment Co. (BIB5) 1995
Kuck (BIB12) 1986; 231
Mowry (BIB20) 1994
Lenoski, Laudon, Gharachorloo, Weber, Gupta, Hennessy, Horowitz, Lam (BIB14) 1992; 25
Park, Saavedra (BIB21) 1996
Gharachorloo, Gupta, Hennessy (BIB7) 1991
Mowry, Lam, Gupta (BIB19) 1992
Saavedra, Mao, Hwang (BIB22) 1994
Lim, Biamchini (BIB16) 1995
Agarwal, Lim, Kranz, Kubiatowicz (BIB1) 1990
Dally (BIB6) 1990; 39
Lenoski, Weber (BIB15) 1995
Singh, Weber, Gupta (BIB23) 1992; 20
(BIB17) 1995
Cray Research (BIB4) 1993
Lee, Yew, Lawrie (BIB13) 1987
Agarwal (10.1016/S0166-5316(96)90028-0_BIB2) 1991; 2
Cray Research (10.1016/S0166-5316(96)90028-0_BIB4) 1993
Johnson (10.1016/S0166-5316(96)90028-0_BIB10) 1992
Mowry (10.1016/S0166-5316(96)90028-0_BIB18) 1991; 12
Lee (10.1016/S0166-5316(96)90028-0_BIB13) 1987
Park (10.1016/S0166-5316(96)90028-0_BIB21) 1996
Dally (10.1016/S0166-5316(96)90028-0_BIB6) 1990; 39
Kroft (10.1016/S0166-5316(96)90028-0_BIB11) 1981
Kuck (10.1016/S0166-5316(96)90028-0_BIB12) 1986; 231
Lenoski (10.1016/S0166-5316(96)90028-0_BIB14) 1992; 25
Lim (10.1016/S0166-5316(96)90028-0_BIB16) 1995
Mowry (10.1016/S0166-5316(96)90028-0_BIB19) 1992
Agarwal (10.1016/S0166-5316(96)90028-0_BIB3) 1992; 3
Mowry (10.1016/S0166-5316(96)90028-0_BIB20) 1994
Agarwal (10.1016/S0166-5316(96)90028-0_BIB1) 1990
Lenoski (10.1016/S0166-5316(96)90028-0_BIB15) 1995
Digital Equipment Co. (10.1016/S0166-5316(96)90028-0_BIB5) 1995
Saavedra (10.1016/S0166-5316(96)90028-0_BIB22) 1994
Gharachorloo (10.1016/S0166-5316(96)90028-0_BIB7) 1991
Singh (10.1016/S0166-5316(96)90028-0_BIB23) 1992; 20
Gupta (10.1016/S0166-5316(96)90028-0_BIB9) 1991
(10.1016/S0166-5316(96)90028-0_BIB17) 1995
Gornish (10.1016/S0166-5316(96)90028-0_BIB8) 1990
References_xml – start-page: 104
  year: 1990
  end-page: 114
  ident: BIB1
  article-title: APRIL: A Processor Architecture for Multiprocessing
  publication-title: The 17th Annual Int. Symp. on Comp. Arch.
– start-page: 364
  year: 1996
  end-page: 373
  ident: BIB21
  article-title: Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture
  publication-title: Proc. 29th Annual Simulation Symp.
– volume: 39
  start-page: 775
  year: 1990
  end-page: 785
  ident: BIB6
  article-title: Performance Analysis of
  publication-title: IEEE Trans. on Computers
– volume: 3
  start-page: 176
  year: 1992
  end-page: 186
  ident: BIB3
  article-title: Performance Trade-offs in Multithreaded Processors
  publication-title: IEEE Trans. on Par. and Distributed Systems
– start-page: 354
  year: 1990
  end-page: 368
  ident: BIB8
  article-title: Compiler-Directed Data Prefetching in Multiprocessors with Memory Hierarchies
  publication-title: Supercomputing
– year: 1995
  ident: BIB16
  article-title: Limits on the Performance of Multithreading and Prefetching
  publication-title: IBM Research Report, RC20238
– year: 1993
  ident: BIB4
  publication-title: CRAY T3D System Architecture Overview Manual
– start-page: 392
  year: 1992
  end-page: 402
  ident: BIB10
  article-title: The Impact of Communication Locality on Large-Scale Multiprocessor Performance
  publication-title: Proc. 19th Annual Int. Symp. on Comp. Arch.
– year: 1995
  ident: BIB15
  publication-title: Scalable Shared-Memory Multiprocessing
– start-page: 427
  year: 1994
  end-page: 448
  ident: BIB22
  article-title: Performance and Optimization of Data Prefetching Strategies in Scalable Multiprocessors
  publication-title: Journal of Parallel and Distributed Computing
– year: 1994
  ident: BIB20
  article-title: Tolerating Latency through Software-Controlled Data Prefetching
  publication-title: Ph.D. Thesis
– start-page: 245
  year: 1991
  end-page: 257
  ident: BIB7
  article-title: Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors
  publication-title: Proc. 4th ASPLOS Conf.
– start-page: 28
  year: 1987
  end-page: 31
  ident: BIB13
  article-title: Date Prefetching in Shared Memory Multiprocessors
  publication-title: Proc. Int. Conf. on Parallel Processing
– start-page: 81
  year: 1981
  end-page: 86
  ident: BIB11
  article-title: Lock-up-Free Instruction Fetch/Prefetch Cache Organization
  publication-title: Proc. 8th Annual Int. Symp. on Comp. Arch.
– volume: 25
  start-page: 63
  year: 1992
  end-page: 79
  ident: BIB14
  article-title: The Stanford DASH Multiprocessor
  publication-title: IEEE Computer
– volume: 2
  start-page: 398
  year: 1991
  end-page: 412
  ident: BIB2
  article-title: Limits on Interconnection Network Performance
  publication-title: IEEE Trans. on Par. and Dist. Systems
– volume: 231
  start-page: 967
  year: 1986
  end-page: 974
  ident: BIB12
  article-title: Parallel Supercomputing Today and the Cedar Approach
  publication-title: Science
– year: 1995
  ident: BIB17
  article-title: Mitsubishi DRAM Family
– volume: 12
  start-page: 87
  year: 1991
  end-page: 106
  ident: BIB18
  article-title: Tolerating Latency through Software-Controlled Prefetching in Shared-Memory Multiprocessors
  publication-title: J. Par. and Dist. Computing
– start-page: 254
  year: 1991
  end-page: 263
  ident: BIB9
  article-title: Comparative Evaluation of Latency Reducing and Tolerating Techniques
  publication-title: Proc, 18th Annual Int. Symp. Computer Arch.
– year: 1995
  ident: BIB5
  article-title: Alpha 21164A Microprocessor Announcement
– start-page: 62
  year: 1992
  end-page: 73
  ident: BIB19
  article-title: Design and Evaluation of a Compiler Algorithm for Prefetching
  publication-title: Proc. 5th ASPLOS Conf.
– volume: 20
  start-page: 5
  year: 1992
  end-page: 44
  ident: BIB23
  article-title: SPLASH: Stanford Parallel Applications for Shared Memory
  publication-title: Computer Architecture News
– start-page: 28
  year: 1987
  ident: 10.1016/S0166-5316(96)90028-0_BIB13
  article-title: Date Prefetching in Shared Memory Multiprocessors
– volume: 12
  start-page: 87
  year: 1991
  ident: 10.1016/S0166-5316(96)90028-0_BIB18
  article-title: Tolerating Latency through Software-Controlled Prefetching in Shared-Memory Multiprocessors
  publication-title: J. Par. and Dist. Computing
  doi: 10.1016/0743-7315(91)90014-Z
– start-page: 427
  year: 1994
  ident: 10.1016/S0166-5316(96)90028-0_BIB22
  article-title: Performance and Optimization of Data Prefetching Strategies in Scalable Multiprocessors
  publication-title: Journal of Parallel and Distributed Computing
  doi: 10.1006/jpdc.1994.1102
– volume: 20
  start-page: 5
  issue: 1
  year: 1992
  ident: 10.1016/S0166-5316(96)90028-0_BIB23
  article-title: SPLASH: Stanford Parallel Applications for Shared Memory
  publication-title: Computer Architecture News
  doi: 10.1145/130823.130824
– start-page: 81
  year: 1981
  ident: 10.1016/S0166-5316(96)90028-0_BIB11
  article-title: Lock-up-Free Instruction Fetch/Prefetch Cache Organization
– start-page: 62
  year: 1992
  ident: 10.1016/S0166-5316(96)90028-0_BIB19
  article-title: Design and Evaluation of a Compiler Algorithm for Prefetching
– volume: 39
  start-page: 775
  issue: 6
  year: 1990
  ident: 10.1016/S0166-5316(96)90028-0_BIB6
  article-title: Performance Analysis of k-ary n-cube Interconnection Networks
  publication-title: IEEE Trans. on Computers
  doi: 10.1109/12.53599
– year: 1994
  ident: 10.1016/S0166-5316(96)90028-0_BIB20
  article-title: Tolerating Latency through Software-Controlled Data Prefetching
– start-page: 364
  year: 1996
  ident: 10.1016/S0166-5316(96)90028-0_BIB21
  article-title: Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture
– volume: 2
  start-page: 398
  issue: 4
  year: 1991
  ident: 10.1016/S0166-5316(96)90028-0_BIB2
  article-title: Limits on Interconnection Network Performance
  publication-title: IEEE Trans. on Par. and Dist. Systems
  doi: 10.1109/71.97897
– start-page: 245
  year: 1991
  ident: 10.1016/S0166-5316(96)90028-0_BIB7
  article-title: Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors
– year: 1995
  ident: 10.1016/S0166-5316(96)90028-0_BIB17
– volume: 25
  start-page: 63
  issue: 3
  year: 1992
  ident: 10.1016/S0166-5316(96)90028-0_BIB14
  article-title: The Stanford DASH Multiprocessor
  publication-title: IEEE Computer
  doi: 10.1109/2.121510
– year: 1993
  ident: 10.1016/S0166-5316(96)90028-0_BIB4
  publication-title: CRAY T3D System Architecture Overview Manual
– start-page: 254
  year: 1991
  ident: 10.1016/S0166-5316(96)90028-0_BIB9
  article-title: Comparative Evaluation of Latency Reducing and Tolerating Techniques
– start-page: 392
  year: 1992
  ident: 10.1016/S0166-5316(96)90028-0_BIB10
  article-title: The Impact of Communication Locality on Large-Scale Multiprocessor Performance
– year: 1995
  ident: 10.1016/S0166-5316(96)90028-0_BIB15
– volume: 231
  start-page: 967
  year: 1986
  ident: 10.1016/S0166-5316(96)90028-0_BIB12
  article-title: Parallel Supercomputing Today and the Cedar Approach
  publication-title: Science
  doi: 10.1126/science.231.4741.967
– start-page: 104
  year: 1990
  ident: 10.1016/S0166-5316(96)90028-0_BIB1
  article-title: APRIL: A Processor Architecture for Multiprocessing
– volume: 3
  start-page: 176
  issue: 5
  year: 1992
  ident: 10.1016/S0166-5316(96)90028-0_BIB3
  article-title: Performance Trade-offs in Multithreaded Processors
  publication-title: IEEE Trans. on Par. and Distributed Systems
– year: 1995
  ident: 10.1016/S0166-5316(96)90028-0_BIB16
  article-title: Limits on the Performance of Multithreading and Prefetching
  publication-title: IBM Research Report, RC20238
– year: 1995
  ident: 10.1016/S0166-5316(96)90028-0_BIB5
– start-page: 354
  year: 1990
  ident: 10.1016/S0166-5316(96)90028-0_BIB8
  article-title: Compiler-Directed Data Prefetching in Multiprocessors with Memory Hierarchies
SSID ssj0005730
Score 1.4294643
Snippet Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 209
SubjectTerms Analytical modelling
Distributed shared-memory multiprocessors
Execution-driven simulation
Software data prefetching
Title The limits and effectiveness of data prefetching on scalable multiprocessors
URI https://dx.doi.org/10.1016/S0166-5316(96)90028-0
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-745X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005730
  issn: 0166-5316
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bS8MwFA46ffDFu3gnDz4oo1q7NGkeRZQpIoJT91aaGwrSyTbFn-9Jk168MPRBGGEUlnb9vp58OT0XhPZiqcHyqyzIJIENCuU6EJFhgRBCCmkToIrOc_dX7Po66ff5jW9vNSraCbA8T97f-cu_Qg3HAGybOvsHuKtJ4QB8B9BhBNhh_DXwzzZtyVVfdgEbpU0DaWhjQm1pAGMBK2Ke8_YIkCpyqFx8oUseGPgXPV663jRSDOoa4bVHu_C5Puinx9fK0t9m2ZtWQ6dPM5Pp53b3sPYz2NDkptOR0gCeVdq0mhErs7q95Qt5YxGNnBvjm312roLbakZQ0RwGWBKLStn1olS-iP-yVlURhI3gNEpTO1XK4WOnScNpNBMx2Cm10MzJxVn_so75YUXzmer0dUrXUX1N-5we-Ov5Waw0BEhvEc37nQM-cYgvoSmdL6OFsisH9kZ6BV0BAbAjAAYC4E8EwAODLQFwgwB4kOOSAPgLAVbR3flZ77Qb-J4ZgQQpOg6SSCslmOTkODSEG64Ii2WYaCUE_ENNEt0BxUw6hsSEMB0SokiUMZspSowUnTXUyge5Xke4cyxBTiYKnlZGaMSTWChCM1gjIhoTQzfQYXlz0hdXGiWdCMsGSspbmHp953RbCvSY_NPNv55rC825-HrrLNtGrfHwVe-gWfk2fhoNdz0vPgDIZ2wW
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+limits+and+effectiveness+of+data+prefetching+on+scalable+multiprocessors&rft.jtitle=Performance+evaluation&rft.au=Mao%2C+Weihua&rft.au=Saavedra%2C+Rafael+H.&rft.date=1996&rft.issn=0166-5316&rft.volume=27-28&rft.spage=209&rft.epage=229&rft_id=info:doi/10.1016%2FS0166-5316%2896%2990028-0&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_S0166_5316_96_90028_0
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0166-5316&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0166-5316&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0166-5316&client=summon