L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several thread...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the International Conference on Parallel Processing p. 51
Main Authors:	Fengguang Song, Moore, S., Dongarra, J.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.09.2007
Subjects:	Analytical models Application software architecture cache Capacity planning chip multi-processor Computer architecture Computer science Degradation multi-threaded programming Parallel processing performance modeling Predictive models Sparse matrices Yarn
ISSN:	0190-3918
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop an analytical model to predict the number of misses on the shared L2 cache. In particular, we apply the model to thread-parallel numerical pro grams. We assume that all the threads compute homogeneous tasks and share a fully associative L2 cache. We use circular sequence profiling and stack processing techniques to analyze the L2 cache trace to predict the number of compulsory cache misses, capacity cache misses on shared data, and capacity cache misses on private data, respectively. Our method is able to predict the L2 cache performance for threads that have a global shared address space. For scientific applications, threads often have overlapping memory footprints. We use a cycle accurate simulator to validate the model with three scientific programs: dense matrix multiplication, blocked dense matrix multiplication, and sparse matrix-vector product. The average relative errors for the three experiments are 8.01%, 1.85%, and 2.41%, respectively.
AbstractList	It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how the performance of a thread varies when running it concurrently with other threads on the remaining cores, we develop an analytical model to predict the number of misses on the shared L2 cache. In particular, we apply the model to thread-parallel numerical pro grams. We assume that all the threads compute homogeneous tasks and share a fully associative L2 cache. We use circular sequence profiling and stack processing techniques to analyze the L2 cache trace to predict the number of compulsory cache misses, capacity cache misses on shared data, and capacity cache misses on private data, respectively. Our method is able to predict the L2 cache performance for threads that have a global shared address space. For scientific applications, threads often have overlapping memory footprints. We use a cycle accurate simulator to validate the model with three scientific programs: dense matrix multiplication, blocked dense matrix multiplication, and sparse matrix-vector product. The average relative errors for the three experiments are 8.01%, 1.85%, and 2.41%, respectively.
Author	Dongarra, J. Moore, S. Fengguang Song
Author_xml	– sequence: 1 surname: Fengguang Song fullname: Fengguang Song organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN – sequence: 2 givenname: S. surname: Moore fullname: Moore, S. organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN – sequence: 3 givenname: J. surname: Dongarra fullname: Dongarra, J. organization: Dept. of Comput. Sci., Univ. of Tennessee, Knoxville, TN
BookMark	eNotzE1LwzAcgPEIE1ynN29e8gVa_0mapDmO4nTQYcEdvI00Ly5Sk9LUg99ewZ0efpenQKuYokPonkBFCKjHfdv3FQWQFadXqAApFKeKsfcVWgNRUDJFmhtU5PwJQIHxeo12HcWtNmeHD8m6McQP7NOM30xwcQk-GLydpjEYvYQUM04Rt-cw4cP3uISyn5NxOac536Jrr8fs7i7doOPu6di-lN3r877ddmVQsJSUeSGlINI3RA9SGWsGAcq4wVhviNZCcOUtlQOzhA6kthI0-7NwXvOasQ16-N8G59xpmsOXnn9ONatZwxv2C27GS7I
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICPP.2007.52
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Architecture Computer Science
EISBN	076952933X 9780769529332
EndPage	51
ExternalDocumentID	4343858
Genre	orig-research
GroupedDBID	-~X 23M 29P 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABDPE ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS XOL
ID	FETCH-LOGICAL-i90t-23f677617f81ab79cdcb609cebcdfc1aa6659fd27b3d12b14d70a3d276efa5433
IEDL.DBID	RIE
ISSN	0190-3918
IngestDate	Wed Aug 27 02:09:16 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-23f677617f81ab79cdcb609cebcdfc1aa6659fd27b3d12b14d70a3d276efa5433
PageCount	1
ParticipantIDs	ieee_primary_4343858
PublicationCentury	2000
PublicationDate	2007-Sept.
PublicationDateYYYYMMDD	2007-09-01
PublicationDate_xml	– month: 09 year: 2007 text: 2007-Sept.
PublicationDecade	2000
PublicationTitle	Proceedings of the International Conference on Parallel Processing
PublicationTitleAbbrev	ICPP
PublicationYear	2007
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0020354 ssib015831335
Score	1.7612944
Snippet	It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12...
SourceID	ieee
SourceType	Publisher
StartPage	51
SubjectTerms	Analytical models Application software architecture cache Capacity planning chip multi-processor Computer architecture Computer science Degradation multi-threaded programming Parallel processing performance modeling Predictive models Sparse matrices Yarn
Title	L2 Cache Modeling for Scientific Applications on Chip Multi-Processors
URI	https://ieeexplore.ieee.org/document/4343858
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05T8MwFH4qFQNiKLRF3PLASGgcx9dYRVQgoSpDh26Vr4guSdWD34-dpMfAwuZjsZ5tPT-_930fwAtx2GrvhiJrjQ9QuFWR4safZWm889MmVrwRm-DTqZjPZd6B1wMWxjlXF5-5t9Csc_m2MrvwVTYKKEhBxRmccc4arNb-7GAqiA-3jjx7MaHpHipNJBaHonc5-szyvCEvDHCjE1GV2qdMev9bzRUMj-A8lB_czjV0XNmHy_FJQqAPvb1YA2rv7gAmXwnKAnszCvJnAYSO_Hu1ma_rhdD4JJeNqhJl38sVqgG6UQsnqNabIcwm77PsI2pFFKKljLdRQgrmjYV5IbDSXBprNIulcdrYwmClGKOysAnXxOJE49TyWBHfZ65QNCXkBrplVbpbQCwmwhluRQhaitQ_LSQtUuNoIgM1enIHg2CkxaqhyVi09rn_e_gBLppv0lCu9Qjd7XrnnuDc_GyXm_Vzvbe_0_ui9g
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05T8MwFH4qBQnEUGiLuPHASGgSx7E9VhFVK0qVoUO3yldEl6Tqwe_HTtJjYGHzsVg-9N7ze9_3AbxiE2hpzZCntbIBCtXCE1TZu8yVNX5S-YJWYhN0MmGzGU8b8LbHwhhjyuIz8-6aZS5fF2rrvsp6DgXJCDuBUxJFoV-htXa3JyAM24DrwLTnYxLtwNKYB2xf9s57oyRNK_pCBzg6klUprcqg9b_1XEH3AM9D6d7wXEPD5G247B-lBNrQ2sk1oPr1dmAwDlHi-JuRE0BzMHRkPdZqvqwYQv2jbDYqcpR8L5aohOh6NaCgWK27MB18TJOhV8soeAvub7wQZzGl1lHJWCAk5UorGftcGal0pgIh4pjwTIdUYh2EMog09QW2_dhkgkQY30AzL3JzCyj2MTOKaubCliyyzgUnWaQMCbkjRw_voOM2ab6siDLm9f7c_z38AufD6dd4Ph5NPh_govo0dcVbj9DcrLbmCc7Uz2axXj2X5_wLf6amPQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+on+Parallel+Processing&rft.atitle=L2+Cache+Modeling+for+Scientific+Applications+on+Chip+Multi-Processors&rft.au=Fengguang+Song&rft.au=Moore%2C+S.&rft.au=Dongarra%2C+J.&rft.date=2007-09-01&rft.pub=IEEE&rft.issn=0190-3918&rft.spage=51&rft.epage=51&rft_id=info:doi/10.1109%2FICPP.2007.52&rft.externalDocID=4343858
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0190-3918&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0190-3918&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0190-3918&client=summon