Automatic generation of ARM NEON micro-kernels for matrix multiplication

General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing Jg. 80; H. 10; S. 13873 - 13899
Hauptverfasser: Alaejos, Guillermo, Martínez, Héctor, Castelló, Adrián, Dolz, Manuel F., Igual, Francisco D., Alonso-Jordá, Pedro, Quintana-Ortí, Enrique S.
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.07.2024
Springer Nature B.V
Schlagworte:
ISSN:0920-8542, 1573-0484
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions.
AbstractList General matrix multiplication (gemm) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions.
General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are mostly written in C, on top of a small, highly tuned micro-kernel that is usually encoded in assembly. The high performance realisation of gemm in linear algebra libraries in general include a single micro-kernel per architecture, usually implemented by an expert. In this paper, we explore a couple of paths to automatically generate gemm micro-kernels, either using C++ templates with vector intrinsics or high-level Python scripts that directly produce assembly code. Both solutions can integrate high performance software techniques, such as loop unrolling and software pipelining, accommodate any data type, and easily generate micro-kernels of any requested dimension. The performance of this solution is tested on three ARM-based cores and compared with state-of-the-art libraries for these processors: BLIS, OpenBLAS and ArmPL. The experimental results show that the auto-generation approach is highly competitive, mainly due to the possibility of adapting the micro-kernel to the problem dimensions.
Author Alonso-Jordá, Pedro
Dolz, Manuel F.
Martínez, Héctor
Castelló, Adrián
Quintana-Ortí, Enrique S.
Igual, Francisco D.
Alaejos, Guillermo
Author_xml – sequence: 1
  givenname: Guillermo
  surname: Alaejos
  fullname: Alaejos, Guillermo
  organization: Universitat Politècnica de València
– sequence: 2
  givenname: Héctor
  surname: Martínez
  fullname: Martínez, Héctor
  organization: Universidad de Córdoba
– sequence: 3
  givenname: Adrián
  surname: Castelló
  fullname: Castelló, Adrián
  email: adcastel@disca.upv.es
  organization: Universitat Politècnica de València
– sequence: 4
  givenname: Manuel F.
  surname: Dolz
  fullname: Dolz, Manuel F.
  organization: Universitat Jaume I de Castelló
– sequence: 5
  givenname: Francisco D.
  surname: Igual
  fullname: Igual, Francisco D.
  organization: Universidad Complutense de Madrid
– sequence: 6
  givenname: Pedro
  surname: Alonso-Jordá
  fullname: Alonso-Jordá, Pedro
  organization: Universitat Politècnica de València
– sequence: 7
  givenname: Enrique S.
  surname: Quintana-Ortí
  fullname: Quintana-Ortí, Enrique S.
  organization: Universitat Politècnica de València
BookMark eNp9kE1LAzEQhoNUsK3-AU8LnqOTz80eS6lWqC2InkOISdm6u6nJLui_N3YFb55mDs_zzvDO0KQLnUPomsAtASjvEiGUlhgoxyAqIbA6Q1MiSoaBKz5BU6goYCU4vUCzlA4AwFnJpmi9GPrQmr62xd51LuYtdEXwxeL5qdiudtuirW0M-N3FzjWp8CEWGY_1Z9EOTV8fm9qenEt07k2T3NXvnKPX-9XLco03u4fH5WKDLSO8x5V13FqpDGHGSeLBcUesAKmkfzNGlJUESSQlpvJM-hIMUeAYlZXwlHHP5uhmzD3G8DG41OtDGGKXT2pGhRSqErLMFB2p_HtK0Xl9jHVr4pcmoH8a02NjOjemT41plSU2SinD3d7Fv-h_rG__NW8b
Cites_doi 10.1109/JPROC.2017.2761740
10.1007/s10586-016-0611-8
10.1145/1377603.1377607
10.1016/j.jpdc.2022.05.009
10.1145/77626.79170
10.1145/1356052.1356053
10.1145/292395.292412
10.1007/s11227-022-05003-3
10.1109/CVPR.2015.7298594
10.1145/1498765.1498785
10.1145/2755561
10.1109/CVPR.2016.90
10.1145/2764454
ContentType Journal Article
Copyright The Author(s) 2024
The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2024
– notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
8FE
8FG
ABJCF
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L6V
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.1007/s11227-024-05955-8
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Materials Science & Engineering Collection
ProQuest Central UK/Ireland
Health Research Premium Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Engineering Collection
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle CrossRef
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Computer Science Database

CrossRef
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central - New (Subscription)
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 13899
ExternalDocumentID 10_1007_s11227_024_05955_8
GrantInformation_xml – fundername: European Union
  grantid: 95555
– fundername: Universitat Politècnica de València
– fundername: European Commission
  grantid: 95555
– fundername: Junta de Andalucía
  grantid: POSTDOC_21_00025
  funderid: http://dx.doi.org/10.13039/501100011011
– fundername: Generalitat Valenciana
  grantid: CIDEXG/2022/013; PROMETEO 2023-CIPROM/2022/2
– fundername: Agencia Estatal de Investigación
  grantid: FJC2019-039222; PID2020-113656R
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
C6C
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
8FE
8FG
AZQEC
DWQXO
GNUQQ
JQ2
L6V
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c314t-9ce4cc68a13ae61f0e4e1c50686fdaa5796061621a9f36f70a180e32695f234f3
IEDL.DBID RSV
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001181046900003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Sun Nov 30 04:07:39 EST 2025
Sat Nov 29 04:27:46 EST 2025
Fri Feb 21 02:41:29 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords SIMD arithmetic units
High performance
Matrix multiplication
ARM NEON
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c314t-9ce4cc68a13ae61f0e4e1c50686fdaa5796061621a9f36f70a180e32695f234f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://link.springer.com/10.1007/s11227-024-05955-8
PQID 3256589567
PQPubID 2043774
PageCount 27
ParticipantIDs proquest_journals_3256589567
crossref_primary_10_1007_s11227_024_05955_8
springer_journals_10_1007_s11227_024_05955_8
PublicationCentury 2000
PublicationDate 20240700
2024-07-00
20240701
PublicationDateYYYYMMDD 2024-07-01
PublicationDate_xml – month: 7
  year: 2024
  text: 20240700
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2024
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Goto, van de Geijn (CR3) 2008; 35
Kågström, Ling, van Loan (CR2) 1998; 24
CR8
CR19
Barrachina, Dolz, San Juan, Quintana-Ortí (CR16) 2022; 167
CR18
CR17
CR15
CR13
Catalán, Igual, Mayo, Rodríguez-Sánchez, Quintana-Ortí (CR14) 2016; 19
CR12
Dongarra, Du Croz, Hammarling, Duff (CR1) 1990; 16
CR10
Ben-Nun, Hoefler (CR5) 2019; 52
CR20
Alaejos, Castelló, Martínez, Alonso-Jordá, Igual, Quintana-Ortí (CR9) 2023; 79
Sze, Chen, Yang, Emer (CR4) 2017; 105
Goto, van de Geijn (CR6) 2008; 34
Low, Igual, Smith, Quintana-Ortí (CR11) 2016; 43
Van Zee, van de Geijn (CR7) 2015; 41
FG Van Zee (5955_CR7) 2015; 41
K Goto (5955_CR3) 2008; 35
JJ Dongarra (5955_CR1) 1990; 16
5955_CR8
T Ben-Nun (5955_CR5) 2019; 52
G Alaejos (5955_CR9) 2023; 79
5955_CR12
5955_CR10
5955_CR20
5955_CR15
V Sze (5955_CR4) 2017; 105
5955_CR13
B Kågström (5955_CR2) 1998; 24
5955_CR19
K Goto (5955_CR6) 2008; 34
5955_CR18
5955_CR17
TM Low (5955_CR11) 2016; 43
S Catalán (5955_CR14) 2016; 19
S Barrachina (5955_CR16) 2022; 167
References_xml – volume: 52
  start-page: 65:1
  issue: 4
  year: 2019
  end-page: 65:43
  ident: CR5
  article-title: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis
  publication-title: ACM Comput Surv
– ident: CR19
– ident: CR18
– volume: 105
  start-page: 2295
  issue: 12
  year: 2017
  end-page: 2329
  ident: CR4
  article-title: Efficient processing of deep neural networks: a tutorial and survey
  publication-title: Proc IEEE
  doi: 10.1109/JPROC.2017.2761740
– volume: 19
  start-page: 1037
  issue: 3
  year: 2016
  end-page: 1051
  ident: CR14
  article-title: Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors
  publication-title: Clust Comput
  doi: 10.1007/s10586-016-0611-8
– ident: CR15
– volume: 41
  start-page: 14:1
  issue: 3
  year: 2015
  end-page: 14:33
  ident: CR7
  article-title: BLIS: a framework for rapidly instantiating BLAS functionality
  publication-title: ACM Trans Math Softw
– ident: CR12
– ident: CR17
– volume: 35
  start-page: 1
  issue: 1
  year: 2008
  end-page: 14
  ident: CR3
  article-title: High-performance implementation of the level-3 BLAS
  publication-title: ACM Trans Math Soft
  doi: 10.1145/1377603.1377607
– ident: CR13
– volume: 167
  start-page: 240
  issue: C
  year: 2022
  end-page: 254
  ident: CR16
  article-title: Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors
  publication-title: J Parallel Distrib Comput
  doi: 10.1016/j.jpdc.2022.05.009
– ident: CR10
– volume: 16
  start-page: 1
  issue: 1
  year: 1990
  end-page: 17
  ident: CR1
  article-title: A set of level 3 basic linear algebra subprograms
  publication-title: ACM Trans Math Softw
  doi: 10.1145/77626.79170
– volume: 34
  start-page: 12:1
  issue: 3
  year: 2008
  end-page: 12:25
  ident: CR6
  article-title: Anatomy of a high-performance matrix multiplication
  publication-title: ACM Trans Math Softw
  doi: 10.1145/1356052.1356053
– ident: CR8
– volume: 24
  start-page: 268
  issue: 3
  year: 1998
  end-page: 302
  ident: CR2
  article-title: GEMM-based level 3 BLAS: High-performance model implementations and performance evaluation benchmark
  publication-title: ACM Trans Math Softw
  doi: 10.1145/292395.292412
– volume: 79
  start-page: 8124
  year: 2023
  end-page: 8147
  ident: CR9
  article-title: Micro-kernels for portable and efficient matrix multiplication in deep learning
  publication-title: J Supercomput
  doi: 10.1007/s11227-022-05003-3
– volume: 43
  start-page: 12:1
  issue: 2
  year: 2016
  end-page: 12:18
  ident: CR11
  article-title: Analytical modeling is enough for high-performance BLIS
  publication-title: ACM Trans Math Softw
– ident: CR20
– ident: 5955_CR19
– ident: 5955_CR20
– volume: 19
  start-page: 1037
  issue: 3
  year: 2016
  ident: 5955_CR14
  publication-title: Clust Comput
  doi: 10.1007/s10586-016-0611-8
– ident: 5955_CR18
  doi: 10.1109/CVPR.2015.7298594
– ident: 5955_CR15
– volume: 79
  start-page: 8124
  year: 2023
  ident: 5955_CR9
  publication-title: J Supercomput
  doi: 10.1007/s11227-022-05003-3
– volume: 43
  start-page: 12:1
  issue: 2
  year: 2016
  ident: 5955_CR11
  publication-title: ACM Trans Math Softw
– volume: 52
  start-page: 65:1
  issue: 4
  year: 2019
  ident: 5955_CR5
  publication-title: ACM Comput Surv
– volume: 34
  start-page: 12:1
  issue: 3
  year: 2008
  ident: 5955_CR6
  publication-title: ACM Trans Math Softw
  doi: 10.1145/1356052.1356053
– ident: 5955_CR12
  doi: 10.1145/1498765.1498785
– ident: 5955_CR13
  doi: 10.1145/2755561
– ident: 5955_CR17
  doi: 10.1109/CVPR.2016.90
– ident: 5955_CR10
– volume: 41
  start-page: 14:1
  issue: 3
  year: 2015
  ident: 5955_CR7
  publication-title: ACM Trans Math Softw
  doi: 10.1145/2764454
– volume: 24
  start-page: 268
  issue: 3
  year: 1998
  ident: 5955_CR2
  publication-title: ACM Trans Math Softw
  doi: 10.1145/292395.292412
– ident: 5955_CR8
– volume: 35
  start-page: 1
  issue: 1
  year: 2008
  ident: 5955_CR3
  publication-title: ACM Trans Math Soft
  doi: 10.1145/1377603.1377607
– volume: 105
  start-page: 2295
  issue: 12
  year: 2017
  ident: 5955_CR4
  publication-title: Proc IEEE
  doi: 10.1109/JPROC.2017.2761740
– volume: 167
  start-page: 240
  issue: C
  year: 2022
  ident: 5955_CR16
  publication-title: J Parallel Distrib Comput
  doi: 10.1016/j.jpdc.2022.05.009
– volume: 16
  start-page: 1
  issue: 1
  year: 1990
  ident: 5955_CR1
  publication-title: ACM Trans Math Softw
  doi: 10.1145/77626.79170
SSID ssj0004373
Score 2.3595726
Snippet General matrix multiplication ( gemm ) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm...
General matrix multiplication (gemm) is a fundamental kernel in scientific computing and current frameworks for deep learning. Modern realisations of gemm are...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
StartPage 13873
SubjectTerms Algorithms
Compilers
Computer Science
Deep learning
Interpreters
Linear algebra
Matrices (mathematics)
Matrix algebra
Multiplication
Neon
Performance evaluation
Processor Architectures
Programming Languages
Python
Software
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LS8MwHA66efDifOJ0Sg7eNNg2aZqeZMrGDlrHUNmtxDxEZO3cOvHPN2lTioJePJeG8nvk-_p7AnAmFVb0OeTIeJtERDCJYqUYItxnlIcU87La_ek2ShI2ncZjF3BburLK-k4sL2qZCxsjv8QGm0Nm2Hx0NX9HdmuUza66FRrroG0nlRk7b18PkvGk6YzEVY45Nj9JLCSBa5upmuf8IIiQwShkKEYYIvYdmhq--SNFWiLPsPPfb94GW45zwn5lJDtgTWW7oFPvc4DOvffAqL8q8nKEK3wpp1FbpcFcw_7kDiaD-wTObPkeelOLzEAqNHwXzuyI_0_o6hJdAHAfPA4HDzcj5DYtIIF9UqBYKCIEZdzHXFFfe4ooX4S2fURLzm2_qsF9Gvg81pjqyDOa9JRhfnGoA0w0PgCtLM_UIYAiFlQyFdgOAcK0eUFrypgkyotlJKMuOK-FnM6rgRppMzrZqiQ1KklLlaSsC3q1ZFPnXMu0EWsXXNS6aR7_ftrR36cdg82gNAdbjNsDrWKxUidgQ3wUr8vFqTOtLyTF1Ec
  priority: 102
  providerName: ProQuest
Title Automatic generation of ARM NEON micro-kernels for matrix multiplication
URI https://link.springer.com/article/10.1007/s11227-024-05955-8
https://www.proquest.com/docview/3256589567
Volume 80
WOSCitedRecordID wos001181046900003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: AAdvanced Technologies & Aerospace Database (subscription)
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: P5Z
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: K7-
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Engineering Database (subscription)
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: M7S
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central - New (Subscription)
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 20241214
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: BENPR
  dateStart: 20230101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: Springer Journals
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8JAEJ4IePAiPiOKZA_edBPafXR7RAMhUSsBJcRLs253jTGAgWL8-e6WNo1GD3rppd1JM7PT-aYz3yzAWaKJ5k9MYuttCaZKJDjUWmAqPcEl40Rm3e7jmyCKxGQSDnJS2LLodi9KktmXuiS7eb4fYBtTsIUEjGFRgRpz02Zcjj4al2xIsq4rhzYxEoz6OVXmZxlfw1GJMb-VRbNo06v_7z13YDtHl6iz3g67sKFne1AvTm5AuSPvQ7-zSufZsFb0nM2dduZBc4M6w1sUde8iNHWNevhVL2Y2eCKLbNHUDfP_QHkHYv6r7wAeet37qz7Oz1TAing0xaHSVCkupEek5p5pa6o9xRxRxCRSOmaqjfDc92RoCDdB29qsrS3GC5nxCTXkEKqz-UwfAVKh4onQvuMCUGHsAmO4EAnV7TAJkqAB54Vq47f16Iy4HJLslBRbJcWZkmLRgGah_Th3o2VMLCBjwqZwVthFoe3y9u_Sjv_2-Als-ZnBXBtuE6rpYqVPYVO9py_LRQtql91oMGxB5TrALdcdOrLXAXtsZRvvE74ezUI
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LTxsxEB4FqAQXQh-IQKA-tKfWatb2er0HVEU8lCjJtqqgym1r_EAIkYU8oP1T_Mba-9AKJHrLoefVWt6dzzOf7flmAD5oQw2_CCV2q01jpoTGsTECMxkILkNOZZ7t_nMYJYkYj-PvDXistDA-rbLyibmj1pnyZ-RfqIvNoXBsPvp6e4d91yh_u1q10ChgMTB_HtyWbXbYP3b2_UjI6cnZUQ-XXQWwogGb41gZphQXMqDS8MB2DDOBCr1UwmopvTbTxThOAhlbym3UcbPuGMdy4tASyix1467AGqOC-xU1iHCtw6TFjXbstmQiZKQU6RRSvYCQCLuIiB2hCUMsngbCmt0-u5DN49xp83_7Q1uwWTJq1C2WwGtomMkbaFbdKlDpvN5Cr7uYZ3mBWnSZ19r2kESZRd0fI5ScfEvQjU9OxNdmOnGEATk2j258A4PfqMy6LI8338H5Ur5nG1Yn2cTsAFKx4loY4vUPTFj3grVcCM1MJ9aRjlrwqTJqeluUC0nrwtAeAqmDQJpDIBUtaFeWTEvXMUtrM7bgc4WF-vHLo-3-e7T3sN47Gw3TYT8Z7MEGyaHo047bsDqfLsw-vFL386vZ9CAHNYJfy8bIXwXkLj0
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1bS8MwFA46RXxxXnE6NQ--abBt0jR9HLoxcdYxdeytxFxEZN3YOvHnm_RCVfRBfG57COfkcL7TfN8JAKdSYUWffI5MtklEBJMoVIohwl1GuU8xz9juw14QRWw0CvufVPwZ2708ksw1DXZKU5JeTKW-qIRvrucFyNQXZOCB7yO2DFaI6WQsqWtwP6yUkTg_Yw5Nk8R84hWymZ9tfC1NFd78dkSaVZ5O_f9r3gQbBeqErXybbIEllWyDenmjAywSfAd0W4t0kg1xhc_ZPGobNjjRsDW4hVH7LoJjS-BDr2qWmKIKDeKFYzvk_x0WzMTiF-AueOy0Hy67qLhrAQnskhSFQhEhKOMu5oq62lFEucK3AhItObeKVVP5qefyUGOqA8fE0lEG-4W-9jDReA_Ukkmi9gEUoaCSKc9qBAjT5gOtKWOSKCeUgQwa4Kx0czzNR2rE1fBk66TYOCnOnBSzBmiWkYiL9JrH2AA1n5nWzhg7Lz1fPf7d2sHfXj8Ba_2rTty7jm4OwbqXxc4ydZugls4W6gisirf0ZT47znbdB-0h1Kg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+generation+of+ARM+NEON+micro-kernels+for+matrix+multiplication&rft.jtitle=The+Journal+of+supercomputing&rft.au=Alaejos%2C+Guillermo&rft.au=Mart%C3%ADnez%2C+H%C3%A9ctor&rft.au=Castell%C3%B3%2C+Adri%C3%A1n&rft.au=Dolz%2C+Manuel+F.&rft.date=2024-07-01&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=80&rft.issue=10&rft.spage=13873&rft.epage=13899&rft_id=info:doi/10.1007%2Fs11227-024-05955-8&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11227_024_05955_8
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon