A high-performance matrix transposition for a new MIMD architecture processor PEZY-SC3s
Matrix transposition is a vital kernel widely used in various fields. However, its memory-intensive nature leads to significant memory access conflicts, making it a performance bottleneck. Therefore, optimizing matrix transposition algorithms based on architectural features is crucial for improving...
Uloženo v:
| Vydáno v: | CCF transactions on high performance computing (Online) Ročník 7; číslo 4; s. 323 - 335 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Beijing
Springer Nature B.V
01.08.2025
|
| Témata: | |
| ISSN: | 2524-4922, 2524-4930 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Matrix transposition is a vital kernel widely used in various fields. However, its memory-intensive nature leads to significant memory access conflicts, making it a performance bottleneck. Therefore, optimizing matrix transposition algorithms based on architectural features is crucial for improving the performance of related applications and enhancing system resource utilization. The PEZY-SC3s, a new MIMD (Multiple Instruction Multiple Data) architecture processor, possesses numerous cores and supports SIMD instructions, demonstrating tremendous potential for high-performance computing. However, no matrix transposition algorithm currently exists tailored to the PEZY-SC3s architecture to leverage its computing potential fully. We propose a high-performance matrix transposition algorithm for PEZY-SC3s. First, we block the matrix according to the cache architecture at the microkernel level to improve the memory access pattern. Then, we separate read and write operations by utilizing the PEZY-SC3s’ Local Memory, solving the cache line contention. Finally, we design various processor-level parallel strategies and implement a dynamic selection strategy based on a performance heuristic algorithm for different matrix shapes, alleviating bank conflict and enhancing performance. Experimental results show that our implementation achieves an average speedup of 17.27 times across 60 matrices compared to the baseline algorithm, with a maximum bandwidth utilization of 87.7%. |
|---|---|
| AbstractList | Matrix transposition is a vital kernel widely used in various fields. However, its memory-intensive nature leads to significant memory access conflicts, making it a performance bottleneck. Therefore, optimizing matrix transposition algorithms based on architectural features is crucial for improving the performance of related applications and enhancing system resource utilization. The PEZY-SC3s, a new MIMD (Multiple Instruction Multiple Data) architecture processor, possesses numerous cores and supports SIMD instructions, demonstrating tremendous potential for high-performance computing. However, no matrix transposition algorithm currently exists tailored to the PEZY-SC3s architecture to leverage its computing potential fully. We propose a high-performance matrix transposition algorithm for PEZY-SC3s. First, we block the matrix according to the cache architecture at the microkernel level to improve the memory access pattern. Then, we separate read and write operations by utilizing the PEZY-SC3s’ Local Memory, solving the cache line contention. Finally, we design various processor-level parallel strategies and implement a dynamic selection strategy based on a performance heuristic algorithm for different matrix shapes, alleviating bank conflict and enhancing performance. Experimental results show that our implementation achieves an average speedup of 17.27 times across 60 matrices compared to the baseline algorithm, with a maximum bandwidth utilization of 87.7%. |
| Author | Liu, Jie Liang, Yaling Yang, Shun Guo, Weihao Wang, Qinglin Xia, Rui |
| Author_xml | – sequence: 1 givenname: Yaling orcidid: 0009-0001-1207-2170 surname: Liang fullname: Liang, Yaling – sequence: 2 givenname: Qinglin surname: Wang fullname: Wang, Qinglin – sequence: 3 givenname: Shun surname: Yang fullname: Yang, Shun – sequence: 4 givenname: Rui surname: Xia fullname: Xia, Rui – sequence: 5 givenname: Weihao surname: Guo fullname: Guo, Weihao – sequence: 6 givenname: Jie surname: Liu fullname: Liu, Jie |
| BookMark | eNo9kMFKAzEQhoNUsNa-gKeA52gyye5mj6VWLbQoqIheQjadtVvs7pqkqG9vtOJpBubjn5_vmAzarkVCTgU_F5wXF0FBJhTjkDHOQQqmDsgQMlBMlZIP_neAIzIOYcMTVQgOkA_J04Sum9c169HXnd_a1iHd2uibTxq9bUPfhSY2XUvTlVra4gddzpeX1Hq3biK6uPNIe985DCERd7OXZ3Y_leGEHNb2LeD4b47I49XsYXrDFrfX8-lkwRyAjmylVzYHSN1yt1IaK6601VCospKyrDKpKtSlcGgBZSVdndtCV3mNStdFXnI5Imf73NThfYchmk238216aSQowXWyUiYK9pTzXQgea9P7Zmv9lxHc_Dg0e4cm0ebXoVHyG3cUZRo |
| Cites_doi | 10.1109/HPCA.1999.744320 10.1109/IMW.2017.7939084 10.1007/978-0-85729-760-0 10.1007/s10915-024-02636-9 10.1145/3529113.3529122 10.1007/s11227-021-04282-6 10.1109/TPDS.2015.2412549 10.1177/1094342017710705 10.1145/3091966.3091968 10.3390/electronics11213550 10.1016/j.procs.2016.05.457 10.1016/S0043-1648(00)00427-0 10.1145/342001.339668 10.4218/etrij.2022-0297 10.1007/978-3-030-58814-4_13 10.1109/IA3.2016.015 10.1145/2692916.2555253 10.1109/IPDPS54959.2023.00045 10.1109/CANDAR.2016.0075 10.1109/TC.2020.3030592 10.1007/978-981-97-0801-7_2 10.1109/HPCA.2000.824350 10.1109/NorCAS58970.2023.10305472 10.1103/PhysRevA.75.014304 10.5121/ijcsit.2014.6305 10.1103/PRXQuantum.3.030334 10.1145/3555353 10.1109/ACCESS.2023.3283312 10.1038/s41598-024-58175-8 10.1016/j.ins.2023.119260 |
| ContentType | Journal Article |
| Copyright | China Computer Federation (CCF) 2025. |
| Copyright_xml | – notice: China Computer Federation (CCF) 2025. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s42514-025-00231-4 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2524-4930 |
| EndPage | 335 |
| ExternalDocumentID | 10_1007_s42514_025_00231_4 |
| GroupedDBID | 0R~ 406 AACDK AAHNG AAJBT AASML AATNV AAUYE AAYXX ABAKF ABBRH ABDBE ABDZT ABECU ABFSG ABFTV ABJNI ABKCH ABMQK ABRTQ ABTEG ABTKH ABTMW ABXPI ACAOD ACDTI ACHSB ACMLO ACOKC ACPIV ACSTC ACZOJ ADKNI ADTPH ADURQ ADYFF AEFQL AEJRE AEMSY AEZWR AFBBN AFDZB AFFHD AFHIU AFKRA AFOHR AFQWF AGDGC AGJBK AGMZJ AGQEE AGRTI AHPBZ AHWEU AIGIU AILAN AITGF AIXLP AJZVZ ALMA_UNASSIGNED_HOLDINGS AMKLP AMXSW AMYLF ARAPS ATHPR AXYYD AYFIA BENPR BGLVJ BGNMA CCPQU CITATION DPUIP EBLON EBS EJD FIGPU FINBP FNLPD FSGXE GGCAI H13 HCIFZ IKXTQ IWAJR J-C JZLTJ K7- KOV LLZTM M4Y NPVJJ NQJWS NU0 PHGZM PHGZT PQGLB PT4 ROL RSV SJYHP SNE SNPRN SOHCF SOJ SRMVM SSLCW STPWE TSG UOJIU UTJUX VEKWB VFIZW ZMTXR AESKC JQ2 |
| ID | FETCH-LOGICAL-c228t-d8da6224926cd48eb048a82749b339b534be891cea2e3b3cf6a78b6fe48f76903 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001468989500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2524-4922 |
| IngestDate | Sat Nov 08 16:09:43 EST 2025 Sat Nov 29 07:37:42 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c228t-d8da6224926cd48eb048a82749b339b534be891cea2e3b3cf6a78b6fe48f76903 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0001-1207-2170 |
| PQID | 3241080259 |
| PQPubID | 6587180 |
| PageCount | 13 |
| ParticipantIDs | proquest_journals_3241080259 crossref_primary_10_1007_s42514_025_00231_4 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-08-00 20250801 |
| PublicationDateYYYYMMDD | 2025-08-01 |
| PublicationDate_xml | – month: 08 year: 2025 text: 2025-08-00 |
| PublicationDecade | 2020 |
| PublicationPlace | Beijing |
| PublicationPlace_xml | – name: Beijing |
| PublicationTitle | CCF transactions on high performance computing (Online) |
| PublicationYear | 2025 |
| Publisher | Springer Nature B.V |
| Publisher_xml | – name: Springer Nature B.V |
| References | P Godard (231_CR12) 2020; 70 X Huang (231_CR18) 2022; 49 Z Ma (231_CR27) 2007; 75 231_CR30 T Yamazaki (231_CR38) 2019; 33 M Mannino (231_CR28) 2023; 11 231_CR32 231_CR11 S Liu (231_CR25) 2000; 243 J Gomez-Luna (231_CR13) 2015; 27 231_CR26 X Pei (231_CR33) 2023; 45 231_CR29 X Yang (231_CR39) 2022; 11 C Garner (231_CR10) 2024; 100 J Lee (231_CR22) 2023; 45 T Aoyama (231_CR3) 2016; 80 F Ming (231_CR31) 2023; 643 MH Gordon (231_CR14) 2022; 3 Z Chen (231_CR6) 2022; 78 231_CR7 R Li (231_CR24) 2024; 14 231_CR9 231_CR8 231_CR5 231_CR40 231_CR41 B Catanzaro (231_CR4) 2014; 49 231_CR20 231_CR21 231_CR1 231_CR23 S Rixner (231_CR34) 2000; 28 231_CR35 231_CR36 231_CR15 231_CR37 231_CR16 231_CR17 JNF Alves (231_CR2) 2022; 48 231_CR19 |
| References_xml | – ident: 231_CR11 doi: 10.1109/HPCA.1999.744320 – ident: 231_CR20 doi: 10.1109/IMW.2017.7939084 – ident: 231_CR9 – ident: 231_CR37 doi: 10.1007/978-0-85729-760-0 – volume: 100 start-page: 89 issue: 3 year: 2024 ident: 231_CR10 publication-title: J. Sci. Comput. doi: 10.1007/s10915-024-02636-9 – volume: 49 start-page: 28 issue: 3 year: 2022 ident: 231_CR18 publication-title: ACM SIGMETRICS Performance Eval. Rev. doi: 10.1145/3529113.3529122 – volume: 78 start-page: 9456 issue: 7 year: 2022 ident: 231_CR6 publication-title: J. Supercomput. doi: 10.1007/s11227-021-04282-6 – ident: 231_CR7 – volume: 27 start-page: 776 issue: 3 year: 2015 ident: 231_CR13 publication-title: IEEE Trans. Parallel Distributed Syst. doi: 10.1109/TPDS.2015.2412549 – volume: 33 start-page: 155 issue: 1 year: 2019 ident: 231_CR38 publication-title: Int. J.High Performance Comput. Appl. doi: 10.1177/1094342017710705 – ident: 231_CR23 – ident: 231_CR35 doi: 10.1145/3091966.3091968 – volume: 11 start-page: 3550 issue: 21 year: 2022 ident: 231_CR39 publication-title: Electronics doi: 10.3390/electronics11213550 – volume: 80 start-page: 1418 year: 2016 ident: 231_CR3 publication-title: Proc. Comput. Sci. doi: 10.1016/j.procs.2016.05.457 – volume: 243 start-page: 101 issue: 1–2 year: 2000 ident: 231_CR25 publication-title: Wear doi: 10.1016/S0043-1648(00)00427-0 – ident: 231_CR16 – volume: 28 start-page: 128 issue: 2 year: 2000 ident: 231_CR34 publication-title: ACM SIGARCH Comput. Architec. News doi: 10.1145/342001.339668 – ident: 231_CR21 – volume: 45 start-page: 1035 issue: 6 year: 2023 ident: 231_CR22 publication-title: ETRI J. doi: 10.4218/etrij.2022-0297 – ident: 231_CR29 doi: 10.1007/978-3-030-58814-4_13 – ident: 231_CR40 doi: 10.1109/IA3.2016.015 – volume: 49 start-page: 193 issue: 8 year: 2014 ident: 231_CR4 publication-title: ACM SIGPLAN Notices doi: 10.1145/2692916.2555253 – ident: 231_CR8 – ident: 231_CR30 – ident: 231_CR1 doi: 10.1109/IPDPS54959.2023.00045 – ident: 231_CR26 – ident: 231_CR32 doi: 10.1109/CANDAR.2016.0075 – volume: 70 start-page: 1942 issue: 11 year: 2020 ident: 231_CR12 publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2020.3030592 – ident: 231_CR15 doi: 10.1007/978-981-97-0801-7_2 – ident: 231_CR5 doi: 10.1109/HPCA.2000.824350 – ident: 231_CR17 doi: 10.1109/NorCAS58970.2023.10305472 – volume: 75 issue: 1 year: 2007 ident: 231_CR27 publication-title: Phys. Rev. A-Atomic, Mol. Opt. Phys. doi: 10.1103/PhysRevA.75.014304 – ident: 231_CR41 doi: 10.5121/ijcsit.2014.6305 – volume: 45 start-page: 57 issue: 1 year: 2023 ident: 231_CR33 publication-title: J. Natl. Univ. Defense Technol. – volume: 3 issue: 3 year: 2022 ident: 231_CR14 publication-title: PRX Quantum doi: 10.1103/PRXQuantum.3.030334 – volume: 48 start-page: 1 issue: 4 year: 2022 ident: 231_CR2 publication-title: ACM Trans. Math. Softw. doi: 10.1145/3555353 – volume: 11 start-page: 57514 year: 2023 ident: 231_CR28 publication-title: IEEE Access doi: 10.1109/ACCESS.2023.3283312 – volume: 14 start-page: 7608 issue: 1 year: 2024 ident: 231_CR24 publication-title: Sci. Rep. doi: 10.1038/s41598-024-58175-8 – ident: 231_CR36 – ident: 231_CR19 – volume: 643 year: 2023 ident: 231_CR31 publication-title: Inform. Sci. doi: 10.1016/j.ins.2023.119260 |
| SSID | ssj0002710226 ssib053822361 |
| Score | 2.2991714 |
| Snippet | Matrix transposition is a vital kernel widely used in various fields. However, its memory-intensive nature leads to significant memory access conflicts, making... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Index Database |
| StartPage | 323 |
| SubjectTerms | Algorithms Bandwidths Computation Design Energy efficiency Heuristic methods High performance computing Microkernels Microprocessors MIMD (computers) Optimization Performance enhancement Resource utilization Supercomputers |
| Title | A high-performance matrix transposition for a new MIMD architecture processor PEZY-SC3s |
| URI | https://www.proquest.com/docview/3241080259 |
| Volume | 7 |
| WOSCitedRecordID | wos001468989500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 2524-4930 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002710226 issn: 2524-4922 databaseCode: RSV dateStart: 20190501 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5jePDiD1ScTsnBmwbXJG2T45gbCm4Mp3N6KUmbgAfnWKv45_uStupADzu3pOXjvfe9R977HkJnyvDMRtQQKi0nnMUdogzlJIPi2UbSSOOXwUxv49FIzGZy3EAX_97gX-ZgVQEnbu2qIxioeCDgBhF16wruJtPaeMBxaS0k4sMwddzp163RED7OJaXV0Mzfp64S02pc9mQz2F7vN3fQVpVU4m5pBbuoYeZ76LGLnRYxWfyMBuBXJ8j_iYtS0bxs18LwFCsM6TUe3gyv8O-rBbwo5wjgjXH_-YlMeizfRw-D_n3vmlRrFEhKqShIJjIFcDllwDTjwmhwWiWgGpWaMalDxrURMkiNooZpltpIxUJH1nBhYyie2QFqzt_m5hBh3sl4oC1ToVNa4yFQGdXSBCJ1wn9KttB5jWGyKNUykm9dZA9QAgAlHqCEt1C7hjmpPCdPIMFzbY9QlR2tddgx2qQefNeb10bNYvluTtBG-lG85MtTbypf5f2xag |
| linkProvider | Springer Nature |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+high-performance+matrix+transposition+for+a+new+MIMD+architecture+processor+PEZY-SC3s&rft.jtitle=CCF+transactions+on+high+performance+computing+%28Online%29&rft.au=Liang%2C+Yaling&rft.au=Wang%2C+Qinglin&rft.au=Yang%2C+Shun&rft.au=Xia%2C+Rui&rft.date=2025-08-01&rft.issn=2524-4922&rft.eissn=2524-4930&rft.volume=7&rft.issue=4&rft.spage=323&rft.epage=335&rft_id=info:doi/10.1007%2Fs42514-025-00231-4&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s42514_025_00231_4 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2524-4922&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2524-4922&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2524-4922&client=summon |