L2 Cache Modeling for Scientific Applications on Chip Multi-Processors
It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several thread...
Saved in:
| Published in: | Proceedings of the International Conference on Parallel Processing p. 51 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.09.2007
|
| Subjects: | |
| ISSN: | 0190-3918 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop an analytical model to predict the number of misses on the shared L2 cache. In particular, we apply the model to thread-parallel numerical pro grams. We assume that all the threads compute homogeneous tasks and share a fully associative L2 cache. We use circular sequence profiling and stack processing techniques to analyze the L2 cache trace to predict the number of compulsory cache misses, capacity cache misses on shared data, and capacity cache misses on private data, respectively. Our method is able to predict the L2 cache performance for threads that have a global shared address space. For scientific applications, threads often have overlapping memory footprints. We use a cycle accurate simulator to validate the model with three scientific programs: dense matrix multiplication, blocked dense matrix multiplication, and sparse matrix-vector product. The average relative errors for the three experiments are 8.01%, 1.85%, and 2.41%, respectively. |
|---|---|
| AbstractList | It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop an analytical model to predict the number of misses on the shared L2 cache. In particular, we apply the model to thread-parallel numerical pro grams. We assume that all the threads compute homogeneous tasks and share a fully associative L2 cache. We use circular sequence profiling and stack processing techniques to analyze the L2 cache trace to predict the number of compulsory cache misses, capacity cache misses on shared data, and capacity cache misses on private data, respectively. Our method is able to predict the L2 cache performance for threads that have a global shared address space. For scientific applications, threads often have overlapping memory footprints. We use a cycle accurate simulator to validate the model with three scientific programs: dense matrix multiplication, blocked dense matrix multiplication, and sparse matrix-vector product. The average relative errors for the three experiments are 8.01%, 1.85%, and 2.41%, respectively. |
| Author | Dongarra, J. Moore, S. Fengguang Song |
| Author_xml | – sequence: 1 surname: Fengguang Song fullname: Fengguang Song organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN – sequence: 2 givenname: S. surname: Moore fullname: Moore, S. organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN – sequence: 3 givenname: J. surname: Dongarra fullname: Dongarra, J. organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN |
| BookMark | eNotzE1LwzAcgPEIE1ynN29e8gVa_0mapDmO4nTQYcEdvI00Ly5Sk9LUg99ewZ0efpenQKuYokPonkBFCKjHfdv3FQWQFadXqAApFKeKsfcVWgNRUDJFmhtU5PwJQIHxeo12HcWtNmeHD8m6McQP7NOM30xwcQk-GLydpjEYvYQUM04Rt-cw4cP3uISyn5NxOac536Jrr8fs7i7doOPu6di-lN3r877ddmVQsJSUeSGlINI3RA9SGWsGAcq4wVhviNZCcOUtlQOzhA6kthI0-7NwXvOasQ16-N8G59xpmsOXnn9ONatZwxv2C27GS7I |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICPP.2007.52 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Architecture Computer Science |
| EISBN | 076952933X 9780769529332 |
| EndPage | 51 |
| ExternalDocumentID | 4343858 |
| Genre | orig-research |
| GroupedDBID | -~X 23M 29P 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABDPE ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS XOL |
| ID | FETCH-LOGICAL-i90t-23f677617f81ab79cdcb609cebcdfc1aa6659fd27b3d12b14d70a3d276efa5433 |
| IEDL.DBID | RIE |
| ISSN | 0190-3918 |
| IngestDate | Wed Aug 27 02:09:16 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-23f677617f81ab79cdcb609cebcdfc1aa6659fd27b3d12b14d70a3d276efa5433 |
| PageCount | 1 |
| ParticipantIDs | ieee_primary_4343858 |
| PublicationCentury | 2000 |
| PublicationDate | 2007-Sept. |
| PublicationDateYYYYMMDD | 2007-09-01 |
| PublicationDate_xml | – month: 09 year: 2007 text: 2007-Sept. |
| PublicationDecade | 2000 |
| PublicationTitle | Proceedings of the International Conference on Parallel Processing |
| PublicationTitleAbbrev | ICPP |
| PublicationYear | 2007 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020354 ssib015831335 |
| Score | 1.7612944 |
| Snippet | It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 51 |
| SubjectTerms | Analytical models Application software architecture cache Capacity planning chip multi-processor Computer architecture Computer science Degradation multi-threaded programming Parallel processing performance modeling Predictive models Sparse matrices Yarn |
| Title | L2 Cache Modeling for Scientific Applications on Chip Multi-Processors |
| URI | https://ieeexplore.ieee.org/document/4343858 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05T8MwFH4qFQNiKLRF3PLASGgcx9dYRVQgoSpDh26Vr4guSdWD34-dpMfAwuZjsZ5tPT-_930fwAtx2GrvhiJrjQ9QuFWR4safZWm889MmVrwRm-DTqZjPZd6B1wMWxjlXF5-5t9Csc_m2MrvwVTYKKEhBxRmccc4arNb-7GAqiA-3jjx7MaHpHipNJBaHonc5-szyvCEvDHCjE1GV2qdMev9bzRUMj-A8lB_czjV0XNmHy_FJQqAPvb1YA2rv7gAmXwnKAnszCvJnAYSO_Hu1ma_rhdD4JJeNqhJl38sVqgG6UQsnqNabIcwm77PsI2pFFKKljLdRQgrmjYV5IbDSXBprNIulcdrYwmClGKOysAnXxOJE49TyWBHfZ65QNCXkBrplVbpbQCwmwhluRQhaitQ_LSQtUuNoIgM1enIHg2CkxaqhyVi09rn_e_gBLppv0lCu9Qjd7XrnnuDc_GyXm_Vzvbe_0_ui9g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05T8MwFH4qBQnEUGiLuPHASGgSx7E9VhFVK0qVoUO3yldEl6Tqwe_HTtJjYGHzsVg-9N7ze9_3AbxiE2hpzZCntbIBCtXCE1TZu8yVNX5S-YJWYhN0MmGzGU8b8LbHwhhjyuIz8-6aZS5fF2rrvsp6DgXJCDuBUxJFoV-htXa3JyAM24DrwLTnYxLtwNKYB2xf9s57oyRNK_pCBzg6klUprcqg9b_1XEH3AM9D6d7wXEPD5G247B-lBNrQ2sk1oPr1dmAwDlHi-JuRE0BzMHRkPdZqvqwYQv2jbDYqcpR8L5aohOh6NaCgWK27MB18TJOhV8soeAvub7wQZzGl1lHJWCAk5UorGftcGal0pgIh4pjwTIdUYh2EMog09QW2_dhkgkQY30AzL3JzCyj2MTOKaubCliyyzgUnWaQMCbkjRw_voOM2ab6siDLm9f7c_z38AufD6dd4Ph5NPh_govo0dcVbj9DcrLbmCc7Uz2axXj2X5_wLf6amPQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+on+Parallel+Processing&rft.atitle=L2+Cache+Modeling+for+Scientific+Applications+on+Chip+Multi-Processors&rft.au=Fengguang+Song&rft.au=Moore%2C+S.&rft.au=Dongarra%2C+J.&rft.date=2007-09-01&rft.pub=IEEE&rft.issn=0190-3918&rft.spage=51&rft.epage=51&rft_id=info:doi/10.1109%2FICPP.2007.52&rft.externalDocID=4343858 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0190-3918&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0190-3918&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0190-3918&client=summon |