Combining analytical and empirical approaches in tuning matrix transposition

Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated opti...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:PACT 2006 : proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques : September 16-20, 2006, Seattle, Washington, USA. s. 233 - 242
Hlavní autori: Lu, Qingda, Krishnamoorthy, Sriram, Sadayappan, P.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 01.09.2006
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations. The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors. The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse. Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework.
AbstractList Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations. The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors. The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse. Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework.
Author Lu, Qingda
Krishnamoorthy, Sriram
Sadayappan, P.
Author_xml – sequence: 1
  givenname: Qingda
  surname: Lu
  fullname: Lu, Qingda
  organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
– sequence: 2
  givenname: Sriram
  surname: Krishnamoorthy
  fullname: Krishnamoorthy, Sriram
  organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
– sequence: 3
  givenname: P.
  surname: Sadayappan
  fullname: Sadayappan, P.
  organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
BookMark eNotjk1LxDAYhCO4oO727MFL_0DXvPlokqMUv6DgRc_L2zTVSJuWJIL77y27zuWZgWGYG3IZ5uAIuQW6BxDyHkAykGJ_oqEXpDBKgzTScFYLfUWKlL7pKiU5pfyatM08dT748FliwPGYvcVxtX3ppsXHc1qWOKP9cqn0ocw_p_aEOfrfMkcMaZmTz34OO7IZcEyu-OeWfDw9vjcvVfv2_No8tBWC4bkarNXYM8VkDYNcz1E-UGYFr1nXU6eVcIJ3tQWpOycGkFZR0XfCWgTlAPmW3J13vXPusEQ_YTwelBZKGsP_AEUWTsY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/1152154.1152190
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781595932648
159593264X
EndPage 242
ExternalDocumentID 7847599
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a193t-fcc8ad272561f559503f02c4362bd0e874e43b6c158be4f15c704db4cca17e1a3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:40:59 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a193t-fcc8ad272561f559503f02c4362bd0e874e43b6c158be4f15c704db4cca17e1a3
PageCount 10
ParticipantIDs ieee_primary_7847599
PublicationCentury 2000
PublicationDate 2006-Sept.
PublicationDateYYYYMMDD 2006-09-01
PublicationDate_xml – month: 09
  year: 2006
  text: 2006-Sept.
PublicationDecade 2000
PublicationTitle PACT 2006 : proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques : September 16-20, 2006, Seattle, Washington, USA.
PublicationTitleAbbrev PACT
PublicationYear 2006
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0000753003
Score 1.7284065
Snippet Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization...
SourceID ieee
SourceType Publisher
StartPage 233
SubjectTerms Arrays
Bandwidth
bandwidth-limited
conflict misses
empirical search
Instruction sets
Kernel
matrix transposition
Multimedia communication
Optimization
SIMD
spatial locality
tiling
Title Combining analytical and empirical approaches in tuning matrix transposition
URI https://ieeexplore.ieee.org/document/7847599
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED6VioGpQIt4ywMjbuPYjtMZUTGgqgNI3So_zlKHplWbIn4-thPahYXJDzmJ5NdnX-6-D-AJC86dZJ4aRKQBj5FqoXLqfFEYk2ttjU1iE2o6Lefz8awDz4dYmPBIcj7DYcymf_lubffRVDZSZWSnG5_AiVKqidU62FMC9PEwQ1v2HibkiEVokmKY0rTnHuVTEnpMev_77jkMjmF4ZHYAmAvoYHUJvV8dBtIuyz68hyqThB6IjhwjyTwdso7garPcNqWWOxx3ZFmRep9aryI__zepG4LzxntrAJ-T14-XN9qqJFAdDl819daW2uUqnF2YD_cDmXGf5VYEZDIuw1IJFNwUlsnSoPBMWpUJZ0QYOqaQaX4F3Wpd4TUQy5VU1miFthDMm3AX86XQPryJM-vZDfRj5yw2DRHGou2X27-r7-CssVdEh6x76NbbPT7Aqf2ql7vtYxq9Hx7Bnrg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFH9BNNETKhi_7cGjg3Vt13E2EoxIOGDCjbTda8KBSWAY_3zbbsLFi6d-pNuSfv3at_d-P4BHTBnLBbWRRsTI4TFGisskym2aap0oZbQJYhNyPM5ms_6kAU-7WBj3SHA-w67Phn_5-afZelNZT2aena5_AIeC84RW0Vo7i4oDP-bmaM3fQ7noUQ9OgndDGnbdvYBKwI9B639fPoXOPhCPTHYQcwYNLM6h9avEQOqF2YaRq9JB6oEozzISDNQumxNcrhbrqlSzh-OGLApSbkPrpWfo_yZlRXFe-W914GPwMn0eRrVOQqTc8auMrDGZyhPpTi_UuhuCiJmNE8MdNuk8xkxy5EynhopMI7dUGBnzXHM3eFQiVewCmsVngZdADJNCGq0kmpRTq91tzGZcWfcmRo2lV9D2nTNfVVQY87pfrv-ufoDj4fR9NB-9jt9u4KSyXnj3rFtolust3sGR-SoXm_V9GMkfjjGh_w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+2006+%3A+proceedings+of+the+Fifteenth+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+September+16-20%2C+2006%2C+Seattle%2C+Washington%2C+USA.&rft.atitle=Combining+analytical+and+empirical+approaches+in+tuning+matrix+transposition&rft.au=Lu%2C+Qingda&rft.au=Krishnamoorthy%2C+Sriram&rft.au=Sadayappan%2C+P.&rft.date=2006-09-01&rft.pub=ACM&rft.spage=233&rft.epage=242&rft_id=info:doi/10.1145%2F1152154.1152190&rft.externalDocID=7847599