Combining analytical and empirical approaches in tuning matrix transposition
Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated opti...
Uloženo v:
| Vydáno v: | PACT 2006 : proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques : September 16-20, 2006, Seattle, Washington, USA. s. 233 - 242 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
01.09.2006
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations. The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors. The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse. Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework. |
|---|---|
| AbstractList | Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed. In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors. A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations. The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors. The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse. Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework. |
| Author | Lu, Qingda Krishnamoorthy, Sriram Sadayappan, P. |
| Author_xml | – sequence: 1 givenname: Qingda surname: Lu fullname: Lu, Qingda organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA – sequence: 2 givenname: Sriram surname: Krishnamoorthy fullname: Krishnamoorthy, Sriram organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA – sequence: 3 givenname: P. surname: Sadayappan fullname: Sadayappan, P. organization: Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA |
| BookMark | eNotjk1LxDAYhCO4oO727MFL_0DXvPlokqMUv6DgRc_L2zTVSJuWJIL77y27zuWZgWGYG3IZ5uAIuQW6BxDyHkAykGJ_oqEXpDBKgzTScFYLfUWKlL7pKiU5pfyatM08dT748FliwPGYvcVxtX3ppsXHc1qWOKP9cqn0ocw_p_aEOfrfMkcMaZmTz34OO7IZcEyu-OeWfDw9vjcvVfv2_No8tBWC4bkarNXYM8VkDYNcz1E-UGYFr1nXU6eVcIJ3tQWpOycGkFZR0XfCWgTlAPmW3J13vXPusEQ_YTwelBZKGsP_AEUWTsY |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/1152154.1152190 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781595932648 159593264X |
| EndPage | 242 |
| ExternalDocumentID | 7847599 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a193t-fcc8ad272561f559503f02c4362bd0e874e43b6c158be4f15c704db4cca17e1a3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:40:59 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a193t-fcc8ad272561f559503f02c4362bd0e874e43b6c158be4f15c704db4cca17e1a3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_7847599 |
| PublicationCentury | 2000 |
| PublicationDate | 2006-Sept. |
| PublicationDateYYYYMMDD | 2006-09-01 |
| PublicationDate_xml | – month: 09 year: 2006 text: 2006-Sept. |
| PublicationDecade | 2000 |
| PublicationTitle | PACT 2006 : proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques : September 16-20, 2006, Seattle, Washington, USA. |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2006 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0000753003 |
| Score | 1.7282858 |
| Snippet | Matrix transposition is an important kernel used in many applications. Even though its optimization has been the subject of many studies, an optimization... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 233 |
| SubjectTerms | Arrays Bandwidth bandwidth-limited conflict misses empirical search Instruction sets Kernel matrix transposition Multimedia communication Optimization SIMD spatial locality tiling |
| Title | Combining analytical and empirical approaches in tuning matrix transposition |
| URI | https://ieeexplore.ieee.org/document/7847599 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED1R1KETbaHqtzx0rCEmdpzMVVGHCjG0Ehty7LPEQEAQqv78np0Uli6d4liWItmx38vl7j2AJ6_QZ4lVXCtluFQm47lJkBMZKIUZo_OuMZvQ02k-nxezDjwfamEQMSaf4TA04798t7b7ECob6Tyo0xUncKJ11tRqHeIpBH0pvaGteo-QaiQCNCk5jNd45h7tUyJ6THr_e-45DI5leGx2AJgL6GB1Cb1fHwbWbss-vFNXGY0emAkaIzE8TU3HcLVZbpu7Vjscd2xZsXofR6-CPv83qxuB8yZ7awCfk9ePlzfeuiRwQ-Sr5t7a3LixJu4iPH0fqCT1ydhKQqbSJZhriTItMytUXqL0QlmdSFdKWjqhUZj0CrrVusJrYARUkvirMsQ5JBZY-pQIBh1DxipdeHED_TA5i00jhLFo5-X27-47OGviFSEh6x669XaPD3Bqv-rlbvsYV-8HN6KdFA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24NHB-vW0u1sJBiRcMCEG-m614QDk8Aw_vm-dhMuXjyta5osadd-397e-z6ARyvR9kMjAyWlDoTU_SDRIQZEBjKuI8xtXplNqPE4mc3SSQOedrUwiOiTz7Drmv5ffv5pti5U1lOJU6dLD-BQChGFVbXWLqJC4BfTO1rr93Ahe9yBkxRdf_Wn7t5AxePHoPW_J59CZ1-IxyY7iDmDBhbn0Pp1YmD1xmzDiLoyb_XAtFMZ8QFqauYMl6vFurqr1cNxwxYFK7d-9NIp9H-zspI4r_K3OvAxeJk-D4PaJyHQRL_KwBqT6DxSxF64pS8EGcY2jIwgbMryEBMlUMRZ33CZZCgsl0aFIs8ELR5XyHV8Ac3is8BLYARVghis1MQ6BKaY2ZgoBh1E2kiVWn4FbTc581UlhTGv5-X67-4HOB5O30fz0ev47QZOquiFS8-6hWa53uIdHJmvcrFZ3_uV_AGR5qBb |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+2006+%3A+proceedings+of+the+Fifteenth+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+September+16-20%2C+2006%2C+Seattle%2C+Washington%2C+USA.&rft.atitle=Combining+analytical+and+empirical+approaches+in+tuning+matrix+transposition&rft.au=Lu%2C+Qingda&rft.au=Krishnamoorthy%2C+Sriram&rft.au=Sadayappan%2C+P.&rft.date=2006-09-01&rft.pub=ACM&rft.spage=233&rft.epage=242&rft_id=info:doi/10.1145%2F1152154.1152190&rft.externalDocID=7847599 |