The limits and effectiveness of data prefetching on scalable multiprocessors
Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting t...
Uloženo v:
| Vydáno v: | Performance evaluation Ročník 27; s. 209 - 229 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
1996
|
| Témata: | |
| ISSN: | 0166-5316, 1872-745X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting the performance of prefetching such as: program behavior, network topology, cache coherency protocols, memory consistency models, etc. We use execution-driven simulation to validate the predictions of the model with respect to overall speedup, average memory latency, and cache pollution. We show that the model provides accurate predictions for programs that do not saturate the bandwidth of the network. The model could be used by compilers and/or programmers to determine when to issue prefetch instructions in order to maximize the speedup that can be obtained from software-controlled prefetching. |
|---|---|
| AbstractList | Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and validate an analytical performance model for software-controlled data prefetching. The model incorporates all the important aspects affecting the performance of prefetching such as: program behavior, network topology, cache coherency protocols, memory consistency models, etc. We use execution-driven simulation to validate the predictions of the model with respect to overall speedup, average memory latency, and cache pollution. We show that the model provides accurate predictions for programs that do not saturate the bandwidth of the network. The model could be used by compilers and/or programmers to determine when to issue prefetch instructions in order to maximize the speedup that can be obtained from software-controlled prefetching. |
| Author | Saavedra, Rafael H. Mao, Weihua |
| Author_xml | – sequence: 1 givenname: Weihua surname: Mao fullname: Mao, Weihua – sequence: 2 givenname: Rafael H. surname: Saavedra fullname: Saavedra, Rafael H. email: saavedra@cs.usc.edu |
| BookMark | eNqFUE1LAzEUDFLBtvoThBz1sPqym93snkSKX1DwYAVvIZu82Mg2Kcla8N-bWvEqPOZdZoaZmZGJDx4JOWdwxYA11y8ZmqKuWHPRNZcdQNkWcESmrBVlIXj9NiHTP8oJmaX0AQC1qGBKlqs10sFt3Jio8oaitahHt0OPKdFgqVGjotuIFke9dv6dBk-TVoPqB6Sbz2F02xh0JoeYTsmxVUPCs98_J6_3d6vFY7F8fnha3C4LzUQ5Fm2JxvRCd5yB5Z3tDBe1hhZN3-ecyFusGga8srzmXCBwbnipRMNEy63uqzmpD746hpRyNrmNbqPil2Qg95PIn0nkvq_s8u0nkZB1Nwcd5nA7h1Em7dBrNC7m1tIE94_DN8goayA |
| Cites_doi | 10.1016/0743-7315(91)90014-Z 10.1006/jpdc.1994.1102 10.1145/130823.130824 10.1109/12.53599 10.1109/71.97897 10.1109/2.121510 10.1126/science.231.4741.967 |
| ContentType | Journal Article |
| Copyright | 1996 |
| Copyright_xml | – notice: 1996 |
| DBID | AAYXX CITATION |
| DOI | 10.1016/S0166-5316(96)90028-0 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-745X |
| EndPage | 229 |
| ExternalDocumentID | 10_1016_S0166_5316_96_90028_0 S0166531696900280 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABAOU ABBOA ABFNM ABJNI ABMAC ABTAH ABUCO ABXDB ABYKQ ACAZW ACDAQ ACGFS ACNNM ACRLP ACZNC ADEZE ADGUI ADJOM ADMUD AEBSH AEKER AENEX AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ARUGR AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HAMUX HLZ HVGLF HX~ HZ~ IHE J1W KOM LG9 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SSB SSD SSV SSW SSZ T5K TN5 WUQ ZY4 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c172t-82eddb7c9410f49f9d475c08edbb016e48e361043f45447e044d42a761784fcb3 |
| ISSN | 0166-5316 |
| IngestDate | Sat Nov 29 01:43:59 EST 2025 Fri Feb 23 02:31:01 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Execution-driven simulation Software data prefetching Analytical modelling Distributed shared-memory multiprocessors |
| Language | English |
| License | https://www.elsevier.com/tdm/userlicense/1.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c172t-82eddb7c9410f49f9d475c08edbb016e48e361043f45447e044d42a761784fcb3 |
| PageCount | 21 |
| ParticipantIDs | crossref_primary_10_1016_S0166_5316_96_90028_0 elsevier_sciencedirect_doi_10_1016_S0166_5316_96_90028_0 |
| PublicationCentury | 1900 |
| PublicationDate | 1996-00-00 |
| PublicationDateYYYYMMDD | 1996-01-01 |
| PublicationDate_xml | – year: 1996 text: 1996-00-00 |
| PublicationDecade | 1990 |
| PublicationTitle | Performance evaluation |
| PublicationYear | 1996 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Gornish, Craston, Veidenbaum (BIB8) 1990 Mowry, Gupta (BIB18) 1991; 12 Kroft (BIB11) 1981 Gupta (BIB9) 1991 Agarwal (BIB3) 1992; 3 Johnson (BIB10) 1992 Agarwal (BIB2) 1991; 2 Digital Equipment Co. (BIB5) 1995 Kuck (BIB12) 1986; 231 Mowry (BIB20) 1994 Lenoski, Laudon, Gharachorloo, Weber, Gupta, Hennessy, Horowitz, Lam (BIB14) 1992; 25 Park, Saavedra (BIB21) 1996 Gharachorloo, Gupta, Hennessy (BIB7) 1991 Mowry, Lam, Gupta (BIB19) 1992 Saavedra, Mao, Hwang (BIB22) 1994 Lim, Biamchini (BIB16) 1995 Agarwal, Lim, Kranz, Kubiatowicz (BIB1) 1990 Dally (BIB6) 1990; 39 Lenoski, Weber (BIB15) 1995 Singh, Weber, Gupta (BIB23) 1992; 20 (BIB17) 1995 Cray Research (BIB4) 1993 Lee, Yew, Lawrie (BIB13) 1987 Agarwal (10.1016/S0166-5316(96)90028-0_BIB2) 1991; 2 Cray Research (10.1016/S0166-5316(96)90028-0_BIB4) 1993 Johnson (10.1016/S0166-5316(96)90028-0_BIB10) 1992 Mowry (10.1016/S0166-5316(96)90028-0_BIB18) 1991; 12 Lee (10.1016/S0166-5316(96)90028-0_BIB13) 1987 Park (10.1016/S0166-5316(96)90028-0_BIB21) 1996 Dally (10.1016/S0166-5316(96)90028-0_BIB6) 1990; 39 Kroft (10.1016/S0166-5316(96)90028-0_BIB11) 1981 Kuck (10.1016/S0166-5316(96)90028-0_BIB12) 1986; 231 Lenoski (10.1016/S0166-5316(96)90028-0_BIB14) 1992; 25 Lim (10.1016/S0166-5316(96)90028-0_BIB16) 1995 Mowry (10.1016/S0166-5316(96)90028-0_BIB19) 1992 Agarwal (10.1016/S0166-5316(96)90028-0_BIB3) 1992; 3 Mowry (10.1016/S0166-5316(96)90028-0_BIB20) 1994 Agarwal (10.1016/S0166-5316(96)90028-0_BIB1) 1990 Lenoski (10.1016/S0166-5316(96)90028-0_BIB15) 1995 Digital Equipment Co. (10.1016/S0166-5316(96)90028-0_BIB5) 1995 Saavedra (10.1016/S0166-5316(96)90028-0_BIB22) 1994 Gharachorloo (10.1016/S0166-5316(96)90028-0_BIB7) 1991 Singh (10.1016/S0166-5316(96)90028-0_BIB23) 1992; 20 Gupta (10.1016/S0166-5316(96)90028-0_BIB9) 1991 (10.1016/S0166-5316(96)90028-0_BIB17) 1995 Gornish (10.1016/S0166-5316(96)90028-0_BIB8) 1990 |
| References_xml | – start-page: 104 year: 1990 end-page: 114 ident: BIB1 article-title: APRIL: A Processor Architecture for Multiprocessing publication-title: The 17th Annual Int. Symp. on Comp. Arch. – start-page: 364 year: 1996 end-page: 373 ident: BIB21 article-title: Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture publication-title: Proc. 29th Annual Simulation Symp. – volume: 39 start-page: 775 year: 1990 end-page: 785 ident: BIB6 article-title: Performance Analysis of publication-title: IEEE Trans. on Computers – volume: 3 start-page: 176 year: 1992 end-page: 186 ident: BIB3 article-title: Performance Trade-offs in Multithreaded Processors publication-title: IEEE Trans. on Par. and Distributed Systems – start-page: 354 year: 1990 end-page: 368 ident: BIB8 article-title: Compiler-Directed Data Prefetching in Multiprocessors with Memory Hierarchies publication-title: Supercomputing – year: 1995 ident: BIB16 article-title: Limits on the Performance of Multithreading and Prefetching publication-title: IBM Research Report, RC20238 – year: 1993 ident: BIB4 publication-title: CRAY T3D System Architecture Overview Manual – start-page: 392 year: 1992 end-page: 402 ident: BIB10 article-title: The Impact of Communication Locality on Large-Scale Multiprocessor Performance publication-title: Proc. 19th Annual Int. Symp. on Comp. Arch. – year: 1995 ident: BIB15 publication-title: Scalable Shared-Memory Multiprocessing – start-page: 427 year: 1994 end-page: 448 ident: BIB22 article-title: Performance and Optimization of Data Prefetching Strategies in Scalable Multiprocessors publication-title: Journal of Parallel and Distributed Computing – year: 1994 ident: BIB20 article-title: Tolerating Latency through Software-Controlled Data Prefetching publication-title: Ph.D. Thesis – start-page: 245 year: 1991 end-page: 257 ident: BIB7 article-title: Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors publication-title: Proc. 4th ASPLOS Conf. – start-page: 28 year: 1987 end-page: 31 ident: BIB13 article-title: Date Prefetching in Shared Memory Multiprocessors publication-title: Proc. Int. Conf. on Parallel Processing – start-page: 81 year: 1981 end-page: 86 ident: BIB11 article-title: Lock-up-Free Instruction Fetch/Prefetch Cache Organization publication-title: Proc. 8th Annual Int. Symp. on Comp. Arch. – volume: 25 start-page: 63 year: 1992 end-page: 79 ident: BIB14 article-title: The Stanford DASH Multiprocessor publication-title: IEEE Computer – volume: 2 start-page: 398 year: 1991 end-page: 412 ident: BIB2 article-title: Limits on Interconnection Network Performance publication-title: IEEE Trans. on Par. and Dist. Systems – volume: 231 start-page: 967 year: 1986 end-page: 974 ident: BIB12 article-title: Parallel Supercomputing Today and the Cedar Approach publication-title: Science – year: 1995 ident: BIB17 article-title: Mitsubishi DRAM Family – volume: 12 start-page: 87 year: 1991 end-page: 106 ident: BIB18 article-title: Tolerating Latency through Software-Controlled Prefetching in Shared-Memory Multiprocessors publication-title: J. Par. and Dist. Computing – start-page: 254 year: 1991 end-page: 263 ident: BIB9 article-title: Comparative Evaluation of Latency Reducing and Tolerating Techniques publication-title: Proc, 18th Annual Int. Symp. Computer Arch. – year: 1995 ident: BIB5 article-title: Alpha 21164A Microprocessor Announcement – start-page: 62 year: 1992 end-page: 73 ident: BIB19 article-title: Design and Evaluation of a Compiler Algorithm for Prefetching publication-title: Proc. 5th ASPLOS Conf. – volume: 20 start-page: 5 year: 1992 end-page: 44 ident: BIB23 article-title: SPLASH: Stanford Parallel Applications for Shared Memory publication-title: Computer Architecture News – start-page: 28 year: 1987 ident: 10.1016/S0166-5316(96)90028-0_BIB13 article-title: Date Prefetching in Shared Memory Multiprocessors – volume: 12 start-page: 87 year: 1991 ident: 10.1016/S0166-5316(96)90028-0_BIB18 article-title: Tolerating Latency through Software-Controlled Prefetching in Shared-Memory Multiprocessors publication-title: J. Par. and Dist. Computing doi: 10.1016/0743-7315(91)90014-Z – start-page: 427 year: 1994 ident: 10.1016/S0166-5316(96)90028-0_BIB22 article-title: Performance and Optimization of Data Prefetching Strategies in Scalable Multiprocessors publication-title: Journal of Parallel and Distributed Computing doi: 10.1006/jpdc.1994.1102 – volume: 20 start-page: 5 issue: 1 year: 1992 ident: 10.1016/S0166-5316(96)90028-0_BIB23 article-title: SPLASH: Stanford Parallel Applications for Shared Memory publication-title: Computer Architecture News doi: 10.1145/130823.130824 – start-page: 81 year: 1981 ident: 10.1016/S0166-5316(96)90028-0_BIB11 article-title: Lock-up-Free Instruction Fetch/Prefetch Cache Organization – start-page: 62 year: 1992 ident: 10.1016/S0166-5316(96)90028-0_BIB19 article-title: Design and Evaluation of a Compiler Algorithm for Prefetching – volume: 39 start-page: 775 issue: 6 year: 1990 ident: 10.1016/S0166-5316(96)90028-0_BIB6 article-title: Performance Analysis of k-ary n-cube Interconnection Networks publication-title: IEEE Trans. on Computers doi: 10.1109/12.53599 – year: 1994 ident: 10.1016/S0166-5316(96)90028-0_BIB20 article-title: Tolerating Latency through Software-Controlled Data Prefetching – start-page: 364 year: 1996 ident: 10.1016/S0166-5316(96)90028-0_BIB21 article-title: Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture – volume: 2 start-page: 398 issue: 4 year: 1991 ident: 10.1016/S0166-5316(96)90028-0_BIB2 article-title: Limits on Interconnection Network Performance publication-title: IEEE Trans. on Par. and Dist. Systems doi: 10.1109/71.97897 – start-page: 245 year: 1991 ident: 10.1016/S0166-5316(96)90028-0_BIB7 article-title: Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors – year: 1995 ident: 10.1016/S0166-5316(96)90028-0_BIB17 – volume: 25 start-page: 63 issue: 3 year: 1992 ident: 10.1016/S0166-5316(96)90028-0_BIB14 article-title: The Stanford DASH Multiprocessor publication-title: IEEE Computer doi: 10.1109/2.121510 – year: 1993 ident: 10.1016/S0166-5316(96)90028-0_BIB4 publication-title: CRAY T3D System Architecture Overview Manual – start-page: 254 year: 1991 ident: 10.1016/S0166-5316(96)90028-0_BIB9 article-title: Comparative Evaluation of Latency Reducing and Tolerating Techniques – start-page: 392 year: 1992 ident: 10.1016/S0166-5316(96)90028-0_BIB10 article-title: The Impact of Communication Locality on Large-Scale Multiprocessor Performance – year: 1995 ident: 10.1016/S0166-5316(96)90028-0_BIB15 – volume: 231 start-page: 967 year: 1986 ident: 10.1016/S0166-5316(96)90028-0_BIB12 article-title: Parallel Supercomputing Today and the Cedar Approach publication-title: Science doi: 10.1126/science.231.4741.967 – start-page: 104 year: 1990 ident: 10.1016/S0166-5316(96)90028-0_BIB1 article-title: APRIL: A Processor Architecture for Multiprocessing – volume: 3 start-page: 176 issue: 5 year: 1992 ident: 10.1016/S0166-5316(96)90028-0_BIB3 article-title: Performance Trade-offs in Multithreaded Processors publication-title: IEEE Trans. on Par. and Distributed Systems – year: 1995 ident: 10.1016/S0166-5316(96)90028-0_BIB16 article-title: Limits on the Performance of Multithreading and Prefetching publication-title: IBM Research Report, RC20238 – year: 1995 ident: 10.1016/S0166-5316(96)90028-0_BIB5 – start-page: 354 year: 1990 ident: 10.1016/S0166-5316(96)90028-0_BIB8 article-title: Compiler-Directed Data Prefetching in Multiprocessors with Memory Hierarchies |
| SSID | ssj0005730 |
| Score | 1.4294643 |
| Snippet | Prefetching is a promising technique for hiding and tolerating the large memory latencies expected in scalable multiprocessors. In this paper we present and... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 209 |
| SubjectTerms | Analytical modelling Distributed shared-memory multiprocessors Execution-driven simulation Software data prefetching |
| Title | The limits and effectiveness of data prefetching on scalable multiprocessors |
| URI | https://dx.doi.org/10.1016/S0166-5316(96)90028-0 |
| Volume | 27 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-745X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005730 issn: 0166-5316 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bS8MwFA46ffDFu3gnDz4oo1q7NGkeRZQpIoJT91aaGwrSyTbFn-9Jk168MPRBGGEUlnb9vp58OT0XhPZiqcHyqyzIJIENCuU6EJFhgRBCCmkToIrOc_dX7Po66ff5jW9vNSraCbA8T97f-cu_Qg3HAGybOvsHuKtJ4QB8B9BhBNhh_DXwzzZtyVVfdgEbpU0DaWhjQm1pAGMBK2Ke8_YIkCpyqFx8oUseGPgXPV663jRSDOoa4bVHu_C5Puinx9fK0t9m2ZtWQ6dPM5Pp53b3sPYz2NDkptOR0gCeVdq0mhErs7q95Qt5YxGNnBvjm312roLbakZQ0RwGWBKLStn1olS-iP-yVlURhI3gNEpTO1XK4WOnScNpNBMx2Cm10MzJxVn_so75YUXzmer0dUrXUX1N-5we-Ov5Waw0BEhvEc37nQM-cYgvoSmdL6OFsisH9kZ6BV0BAbAjAAYC4E8EwAODLQFwgwB4kOOSAPgLAVbR3flZ77Qb-J4ZgQQpOg6SSCslmOTkODSEG64Ii2WYaCUE_ENNEt0BxUw6hsSEMB0SokiUMZspSowUnTXUyge5Xke4cyxBTiYKnlZGaMSTWChCM1gjIhoTQzfQYXlz0hdXGiWdCMsGSspbmHp953RbCvSY_NPNv55rC825-HrrLNtGrfHwVe-gWfk2fhoNdz0vPgDIZ2wW |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+limits+and+effectiveness+of+data+prefetching+on+scalable+multiprocessors&rft.jtitle=Performance+evaluation&rft.au=Mao%2C+Weihua&rft.au=Saavedra%2C+Rafael+H.&rft.date=1996&rft.issn=0166-5316&rft.volume=27-28&rft.spage=209&rft.epage=229&rft_id=info:doi/10.1016%2FS0166-5316%2896%2990028-0&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_S0166_5316_96_90028_0 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0166-5316&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0166-5316&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0166-5316&client=summon |