Automatic generation of ARM NEON micro-kernels for matrix multiplication
General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of...
Gespeichert in:
| Veröffentlicht in: | The Journal of supercomputing Jg. 80; H. 10; S. 13873 - 13899 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Springer US
01.07.2024
Springer Nature B.V |
| Schlagworte: | |
| ISSN: | 0920-8542, 1573-0484 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | General matrix multiplication (
gemm
) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of
gemm
are mostly written in C, on top of a small, highly tuned
micro-kernel
that is usually encoded in assembly. The high performance realisation of
gemm
in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate
gemm
micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions. |
|---|---|
| AbstractList | General matrix multiplication (gemm) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions. General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions. |
| Author | Alonso-Jordá, Pedro Dolz, Manuel F. Martínez, Héctor Castelló, Adrián Quintana-Ortí, Enrique S. Igual, Francisco D. Alaejos, Guillermo |
| Author_xml | – sequence: 1 givenname: Guillermo surname: Alaejos fullname: Alaejos, Guillermo organization: Universitat Politècnica de València – sequence: 2 givenname: Héctor surname: Martínez fullname: Martínez, Héctor organization: Universidad de Córdoba – sequence: 3 givenname: Adrián surname: Castelló fullname: Castelló, Adrián email: adcastel@disca.upv.es organization: Universitat Politècnica de València – sequence: 4 givenname: Manuel F. surname: Dolz fullname: Dolz, Manuel F. organization: Universitat Jaume I de Castelló – sequence: 5 givenname: Francisco D. surname: Igual fullname: Igual, Francisco D. organization: Universidad Complutense de Madrid – sequence: 6 givenname: Pedro surname: Alonso-Jordá fullname: Alonso-Jordá, Pedro organization: Universitat Politècnica de València – sequence: 7 givenname: Enrique S. surname: Quintana-Ortí fullname: Quintana-Ortí, Enrique S. organization: Universitat Politècnica de València |
| BookMark | eNp9kE1LAzEQhoNUsK3-AU8LnqOTz80eS6lWqC2InkOISdm6u6nJLui_N3YFb55mDs_zzvDO0KQLnUPomsAtASjvEiGUlhgoxyAqIbA6Q1MiSoaBKz5BU6goYCU4vUCzlA4AwFnJpmi9GPrQmr62xd51LuYtdEXwxeL5qdiudtuirW0M-N3FzjWp8CEWGY_1Z9EOTV8fm9qenEt07k2T3NXvnKPX-9XLco03u4fH5WKDLSO8x5V13FqpDGHGSeLBcUesAKmkfzNGlJUESSQlpvJM-hIMUeAYlZXwlHHP5uhmzD3G8DG41OtDGGKXT2pGhRSqErLMFB2p_HtK0Xl9jHVr4pcmoH8a02NjOjemT41plSU2SinD3d7Fv-h_rG__NW8b |
| Cites_doi | 10.1109/JPROC.2017.2761740 10.1007/s10586-016-0611-8 10.1145/1377603.1377607 10.1016/j.jpdc.2022.05.009 10.1145/77626.79170 10.1145/1356052.1356053 10.1145/292395.292412 10.1007/s11227-022-05003-3 10.1109/CVPR.2015.7298594 10.1145/1498765.1498785 10.1145/2755561 10.1109/CVPR.2016.90 10.1145/2764454 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2024 The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2024 – notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | C6C AAYXX CITATION 8FE 8FG ABJCF AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L6V M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.1007/s11227-024-05955-8 |
| DatabaseName | Springer Nature OA Free Journals CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Materials Science & Engineering Collection ProQuest Central UK/Ireland Health Research Premium Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One ProQuest Central ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Engineering Collection Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | CrossRef Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Computer Science Database CrossRef |
| Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central - New (Subscription) url: https://www.proquest.com/central sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 13899 |
| ExternalDocumentID | 10_1007_s11227_024_05955_8 |
| GrantInformation_xml | – fundername: European Union grantid: 95555 – fundername: Universitat Politècnica de València – fundername: European Commission grantid: 95555 – fundername: Junta de Andalucía grantid: POSTDOC_21_00025 funderid: http://dx.doi.org/10.13039/501100011011 – fundername: Generalitat Valenciana grantid: CIDEXG/2022/013; PROMETEO 2023-CIPROM/2022/2 – fundername: Agencia Estatal de Investigación grantid: FJC2019-039222; PID2020-113656R |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS C6C CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS 8FE 8FG AZQEC DWQXO GNUQQ JQ2 L6V P62 PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c314t-9ce4cc68a13ae61f0e4e1c50686fdaa5796061621a9f36f70a180e32695f234f3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001181046900003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0920-8542 |
| IngestDate | Sun Nov 30 04:07:39 EST 2025 Sat Nov 29 04:27:46 EST 2025 Fri Feb 21 02:41:29 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Keywords | SIMD arithmetic units High performance Matrix multiplication ARM NEON |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c314t-9ce4cc68a13ae61f0e4e1c50686fdaa5796061621a9f36f70a180e32695f234f3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://link.springer.com/10.1007/s11227-024-05955-8 |
| PQID | 3256589567 |
| PQPubID | 2043774 |
| PageCount | 27 |
| ParticipantIDs | proquest_journals_3256589567 crossref_primary_10_1007_s11227_024_05955_8 springer_journals_10_1007_s11227_024_05955_8 |
| PublicationCentury | 2000 |
| PublicationDate | 20240700 2024-07-00 20240701 |
| PublicationDateYYYYMMDD | 2024-07-01 |
| PublicationDate_xml | – month: 7 year: 2024 text: 20240700 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2024 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Goto, van de Geijn (CR3) 2008; 35 Kågström, Ling, van Loan (CR2) 1998; 24 CR8 CR19 Barrachina, Dolz, San Juan, Quintana-Ortí (CR16) 2022; 167 CR18 CR17 CR15 CR13 Catalán, Igual, Mayo, Rodríguez-Sánchez, Quintana-Ortí (CR14) 2016; 19 CR12 Dongarra, Du Croz, Hammarling, Duff (CR1) 1990; 16 CR10 Ben-Nun, Hoefler (CR5) 2019; 52 CR20 Alaejos, Castelló, Martínez, Alonso-Jordá, Igual, Quintana-Ortí (CR9) 2023; 79 Sze, Chen, Yang, Emer (CR4) 2017; 105 Goto, van de Geijn (CR6) 2008; 34 Low, Igual, Smith, Quintana-Ortí (CR11) 2016; 43 Van Zee, van de Geijn (CR7) 2015; 41 FG Van Zee (5955_CR7) 2015; 41 K Goto (5955_CR3) 2008; 35 JJ Dongarra (5955_CR1) 1990; 16 5955_CR8 T Ben-Nun (5955_CR5) 2019; 52 G Alaejos (5955_CR9) 2023; 79 5955_CR12 5955_CR10 5955_CR20 5955_CR15 V Sze (5955_CR4) 2017; 105 5955_CR13 B Kågström (5955_CR2) 1998; 24 5955_CR19 K Goto (5955_CR6) 2008; 34 5955_CR18 5955_CR17 TM Low (5955_CR11) 2016; 43 S Catalán (5955_CR14) 2016; 19 S Barrachina (5955_CR16) 2022; 167 |
| References_xml | – volume: 52 start-page: 65:1 issue: 4 year: 2019 end-page: 65:43 ident: CR5 article-title: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis publication-title: ACM Comput Surv – ident: CR19 – ident: CR18 – volume: 105 start-page: 2295 issue: 12 year: 2017 end-page: 2329 ident: CR4 article-title: Efficient processing of deep neural networks: a tutorial and survey publication-title: Proc IEEE doi: 10.1109/JPROC.2017.2761740 – volume: 19 start-page: 1037 issue: 3 year: 2016 end-page: 1051 ident: CR14 article-title: Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors publication-title: Clust Comput doi: 10.1007/s10586-016-0611-8 – ident: CR15 – volume: 41 start-page: 14:1 issue: 3 year: 2015 end-page: 14:33 ident: CR7 article-title: BLIS: a framework for rapidly instantiating BLAS functionality publication-title: ACM Trans Math Softw – ident: CR12 – ident: CR17 – volume: 35 start-page: 1 issue: 1 year: 2008 end-page: 14 ident: CR3 article-title: High-performance implementation of the level-3 BLAS publication-title: ACM Trans Math Soft doi: 10.1145/1377603.1377607 – ident: CR13 – volume: 167 start-page: 240 issue: C year: 2022 end-page: 254 ident: CR16 article-title: Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors publication-title: J Parallel Distrib Comput doi: 10.1016/j.jpdc.2022.05.009 – ident: CR10 – volume: 16 start-page: 1 issue: 1 year: 1990 end-page: 17 ident: CR1 article-title: A set of level 3 basic linear algebra subprograms publication-title: ACM Trans Math Softw doi: 10.1145/77626.79170 – volume: 34 start-page: 12:1 issue: 3 year: 2008 end-page: 12:25 ident: CR6 article-title: Anatomy of a high-performance matrix multiplication publication-title: ACM Trans Math Softw doi: 10.1145/1356052.1356053 – ident: CR8 – volume: 24 start-page: 268 issue: 3 year: 1998 end-page: 302 ident: CR2 article-title: GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark publication-title: ACM Trans Math Softw doi: 10.1145/292395.292412 – volume: 79 start-page: 8124 year: 2023 end-page: 8147 ident: CR9 article-title: Micro-kernels for portable and efficient matrix multiplication in deep learning publication-title: J Supercomput doi: 10.1007/s11227-022-05003-3 – volume: 43 start-page: 12:1 issue: 2 year: 2016 end-page: 12:18 ident: CR11 article-title: Analytical modeling is enough for high-performance BLIS publication-title: ACM Trans Math Softw – ident: CR20 – ident: 5955_CR19 – ident: 5955_CR20 – volume: 19 start-page: 1037 issue: 3 year: 2016 ident: 5955_CR14 publication-title: Clust Comput doi: 10.1007/s10586-016-0611-8 – ident: 5955_CR18 doi: 10.1109/CVPR.2015.7298594 – ident: 5955_CR15 – volume: 79 start-page: 8124 year: 2023 ident: 5955_CR9 publication-title: J Supercomput doi: 10.1007/s11227-022-05003-3 – volume: 43 start-page: 12:1 issue: 2 year: 2016 ident: 5955_CR11 publication-title: ACM Trans Math Softw – volume: 52 start-page: 65:1 issue: 4 year: 2019 ident: 5955_CR5 publication-title: ACM Comput Surv – volume: 34 start-page: 12:1 issue: 3 year: 2008 ident: 5955_CR6 publication-title: ACM Trans Math Softw doi: 10.1145/1356052.1356053 – ident: 5955_CR12 doi: 10.1145/1498765.1498785 – ident: 5955_CR13 doi: 10.1145/2755561 – ident: 5955_CR17 doi: 10.1109/CVPR.2016.90 – ident: 5955_CR10 – volume: 41 start-page: 14:1 issue: 3 year: 2015 ident: 5955_CR7 publication-title: ACM Trans Math Softw doi: 10.1145/2764454 – volume: 24 start-page: 268 issue: 3 year: 1998 ident: 5955_CR2 publication-title: ACM Trans Math Softw doi: 10.1145/292395.292412 – ident: 5955_CR8 – volume: 35 start-page: 1 issue: 1 year: 2008 ident: 5955_CR3 publication-title: ACM Trans Math Soft doi: 10.1145/1377603.1377607 – volume: 105 start-page: 2295 issue: 12 year: 2017 ident: 5955_CR4 publication-title: Proc IEEE doi: 10.1109/JPROC.2017.2761740 – volume: 167 start-page: 240 issue: C year: 2022 ident: 5955_CR16 publication-title: J Parallel Distrib Comput doi: 10.1016/j.jpdc.2022.05.009 – volume: 16 start-page: 1 issue: 1 year: 1990 ident: 5955_CR1 publication-title: ACM Trans Math Softw doi: 10.1145/77626.79170 |
| SSID | ssj0004373 |
| Score | 2.3595726 |
| Snippet | General matrix multiplication (
gemm
) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of
gemm... General matrix multiplication (gemm) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 13873 |
| SubjectTerms | Algorithms Compilers Computer Science Deep learning Interpreters Linear algebra Matrices (mathematics) Matrix algebra Multiplication Neon Performance evaluation Processor Architectures Programming Languages Python Software |
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LS8MwHA66efDifOJ0Sg7eNNg2aZqeZMrGDlrHUNmtxDxEZO3cOvHPN2lTioJePJeG8nvk-_p7AnAmFVb0OeTIeJtERDCJYqUYItxnlIcU87La_ek2ShI2ncZjF3BburLK-k4sL2qZCxsjv8QGm0Nm2Hx0NX9HdmuUza66FRrroG0nlRk7b18PkvGk6YzEVY45Nj9JLCSBa5upmuf8IIiQwShkKEYYIvYdmhq--SNFWiLPsPPfb94GW45zwn5lJDtgTWW7oFPvc4DOvffAqL8q8nKEK3wpp1FbpcFcw_7kDiaD-wTObPkeelOLzEAqNHwXzuyI_0_o6hJdAHAfPA4HDzcj5DYtIIF9UqBYKCIEZdzHXFFfe4ooX4S2fURLzm2_qsF9Gvg81pjqyDOa9JRhfnGoA0w0PgCtLM_UIYAiFlQyFdgOAcK0eUFrypgkyotlJKMuOK-FnM6rgRppMzrZqiQ1KklLlaSsC3q1ZFPnXMu0EWsXXNS6aR7_ftrR36cdg82gNAdbjNsDrWKxUidgQ3wUr8vFqTOtLyTF1Ec priority: 102 providerName: ProQuest |
| Title | Automatic generation of ARM NEON micro-kernels for matrix multiplication |
| URI | https://link.springer.com/article/10.1007/s11227-024-05955-8 https://www.proquest.com/docview/3256589567 |
| Volume | 80 |
| WOSCitedRecordID | wos001181046900003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: AAdvanced Technologies & Aerospace Database (subscription) customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: P5Z dateStart: 20230101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: K7- dateStart: 20230101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Engineering Database (subscription) customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: M7S dateStart: 20230101 isFulltext: true titleUrlDefault: http://search.proquest.com providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central - New (Subscription) customDbUrl: eissn: 1573-0484 dateEnd: 20241214 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: BENPR dateStart: 20230101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVAVX databaseName: Springer Journals customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEJ4IePAiPiOKZA_edBPafXR7RAMhUSsBJcRLs253jTGAgWL8-e6WNo1GD3rppd1JM7PT-aYz3yzAWaKJ5k9MYuttCaZKJDjUWmAqPcEl40Rm3e7jmyCKxGQSDnJS2LLodi9KktmXuiS7eb4fYBtTsIUEjGFRgRpz02Zcjj4al2xIsq4rhzYxEoz6OVXmZxlfw1GJMb-VRbNo06v_7z13YDtHl6iz3g67sKFne1AvTm5AuSPvQ7-zSufZsFb0nM2dduZBc4M6w1sUde8iNHWNevhVL2Y2eCKLbNHUDfP_QHkHYv6r7wAeet37qz7Oz1TAing0xaHSVCkupEek5p5pa6o9xRxRxCRSOmaqjfDc92RoCDdB29qsrS3GC5nxCTXkEKqz-UwfAVKh4onQvuMCUGHsAmO4EAnV7TAJkqAB54Vq47f16Iy4HJLslBRbJcWZkmLRgGah_Th3o2VMLCBjwqZwVthFoe3y9u_Sjv_2-Als-ZnBXBtuE6rpYqVPYVO9py_LRQtql91oMGxB5TrALdcdOrLXAXtsZRvvE74ezUI |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LTxsxEB4FqAQXQh-IQKA-tKfWatb2er0HVEU8lCjJtqqgym1r_EAIkYU8oP1T_Mba-9AKJHrLoefVWt6dzzOf7flmAD5oQw2_CCV2q01jpoTGsTECMxkILkNOZZ7t_nMYJYkYj-PvDXistDA-rbLyibmj1pnyZ-RfqIvNoXBsPvp6e4d91yh_u1q10ChgMTB_HtyWbXbYP3b2_UjI6cnZUQ-XXQWwogGb41gZphQXMqDS8MB2DDOBCr1UwmopvTbTxThOAhlbym3UcbPuGMdy4tASyix1467AGqOC-xU1iHCtw6TFjXbstmQiZKQU6RRSvYCQCLuIiB2hCUMsngbCmt0-u5DN49xp83_7Q1uwWTJq1C2WwGtomMkbaFbdKlDpvN5Cr7uYZ3mBWnSZ19r2kESZRd0fI5ScfEvQjU9OxNdmOnGEATk2j258A4PfqMy6LI8338H5Ur5nG1Yn2cTsAFKx4loY4vUPTFj3grVcCM1MJ9aRjlrwqTJqeluUC0nrwtAeAqmDQJpDIBUtaFeWTEvXMUtrM7bgc4WF-vHLo-3-e7T3sN47Gw3TYT8Z7MEGyaHo047bsDqfLsw-vFL386vZ9CAHNYJfy8bIXwXkLj0 |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bS8MwFA46RXxxXnE6NQ--abBt0jR9HLoxcdYxdeytxFxEZN3YOvHnm_RCVfRBfG57COfkcL7TfN8JAKdSYUWffI5MtklEBJMoVIohwl1GuU8xz9juw14QRWw0CvufVPwZ2708ksw1DXZKU5JeTKW-qIRvrucFyNQXZOCB7yO2DFaI6WQsqWtwP6yUkTg_Yw5Nk8R84hWymZ9tfC1NFd78dkSaVZ5O_f9r3gQbBeqErXybbIEllWyDenmjAywSfAd0W4t0kg1xhc_ZPGobNjjRsDW4hVH7LoJjS-BDr2qWmKIKDeKFYzvk_x0WzMTiF-AueOy0Hy67qLhrAQnskhSFQhEhKOMu5oq62lFEucK3AhItObeKVVP5qefyUGOqA8fE0lEG-4W-9jDReA_Ukkmi9gEUoaCSKc9qBAjT5gOtKWOSKCeUgQwa4Kx0czzNR2rE1fBk66TYOCnOnBSzBmiWkYiL9JrH2AA1n5nWzhg7Lz1fPf7d2sHfXj8Ba_2rTty7jm4OwbqXxc4ydZugls4W6gisirf0ZT47znbdB-0h1Kg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+generation+of+ARM+NEON+micro-kernels+for+matrix+multiplication&rft.jtitle=The+Journal+of+supercomputing&rft.au=Alaejos%2C+Guillermo&rft.au=Mart%C3%ADnez%2C+H%C3%A9ctor&rft.au=Castell%C3%B3%2C+Adri%C3%A1n&rft.au=Dolz%2C+Manuel+F.&rft.date=2024-07-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=10&rft.spage=13873&rft.epage=13899&rft_id=info:doi/10.1007%2Fs11227-024-05955-8&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_024_05955_8 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |