Fast matrix multiplication for binary and ternary CNNs on ARM CPU

Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and ac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International Conference on Pattern Recognition S. 3176 - 3182
Hauptverfasser: Trusov, Anton, Limonova, Elena, Nikolaev, Dmitry, Arlazarov, Vladimir V.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 21.08.2022
Schlagworte:
ISSN:2831-7475
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and activation and can be computed using Boolean logic and bit count operations. QNNs with ternary weights and activations (TNNs) and binary weights and ternary activations (TBNs) aim to improve recognition quality compared to BNNs while preserving low bit-width. However, their efficient implementation is usually considered on ASICs and FPGAs, limiting their applicability in real-life tasks. At the same time, one of the areas where efficient recognition is most in demand is recognition on mobile devices using their CPUs. However, there are no known fast implementations of TBNs and TNN, only the daBNN library for BNNs inference. In this paper, we propose novel fast algorithms of ternary, ternary-binary, and binary matrix multiplication for mobile devices with ARM architecture. In our algorithms, ternary weights are represented using 2-bit encoding and binary - using one bit. It allows us to replace matrix multiplication with Boolean logic operations that can be computed on 128-bits simultaneously, using ARM NEON SIMD extension. The matrix multiplication results are accumulated in 16-bit integer registers. We also use special reordering of values in left and right matrices. All that allows us to efficiently compute a matrix product while minimizing the number of loads and stores compared to the algorithm from daBNN. Our algorithms can be used to implement inference of convolutional and fully connected layers of TNNs, TBNs, and BNNs. We evaluate them experimentally on ARM Cortex-A73 CPU and compare their inference speed to efficient implementations of full-precision, 8-bit, and 4-bit quantized matrix multiplications. Our experiment shows our implementations of ternary and ternary-binary matrix multiplications to have almost the same inference time, and they are 3.6 times faster than full-precision, 2.5 times faster than 8-bit quantized, and 1.4 times faster than 4-bit quantized matrix multiplication but 2.9 slower than binary matrix multiplication.
AbstractList Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and activation and can be computed using Boolean logic and bit count operations. QNNs with ternary weights and activations (TNNs) and binary weights and ternary activations (TBNs) aim to improve recognition quality compared to BNNs while preserving low bit-width. However, their efficient implementation is usually considered on ASICs and FPGAs, limiting their applicability in real-life tasks. At the same time, one of the areas where efficient recognition is most in demand is recognition on mobile devices using their CPUs. However, there are no known fast implementations of TBNs and TNN, only the daBNN library for BNNs inference. In this paper, we propose novel fast algorithms of ternary, ternary-binary, and binary matrix multiplication for mobile devices with ARM architecture. In our algorithms, ternary weights are represented using 2-bit encoding and binary - using one bit. It allows us to replace matrix multiplication with Boolean logic operations that can be computed on 128-bits simultaneously, using ARM NEON SIMD extension. The matrix multiplication results are accumulated in 16-bit integer registers. We also use special reordering of values in left and right matrices. All that allows us to efficiently compute a matrix product while minimizing the number of loads and stores compared to the algorithm from daBNN. Our algorithms can be used to implement inference of convolutional and fully connected layers of TNNs, TBNs, and BNNs. We evaluate them experimentally on ARM Cortex-A73 CPU and compare their inference speed to efficient implementations of full-precision, 8-bit, and 4-bit quantized matrix multiplications. Our experiment shows our implementations of ternary and ternary-binary matrix multiplications to have almost the same inference time, and they are 3.6 times faster than full-precision, 2.5 times faster than 8-bit quantized, and 1.4 times faster than 4-bit quantized matrix multiplication but 2.9 slower than binary matrix multiplication.
Author Limonova, Elena
Nikolaev, Dmitry
Trusov, Anton
Arlazarov, Vladimir V.
Author_xml – sequence: 1
  givenname: Anton
  surname: Trusov
  fullname: Trusov, Anton
  email: trusov.av@smartengines.ru
  organization: Moscow Institute of Physics and Technology,Dolgoprudny,Russia
– sequence: 2
  givenname: Elena
  surname: Limonova
  fullname: Limonova, Elena
  email: limonova@smartengines.com
  organization: Federal Research Center "Computer Science and Control" of Russian Academy of Sciences,Moscow,Russia
– sequence: 3
  givenname: Dmitry
  surname: Nikolaev
  fullname: Nikolaev, Dmitry
  email: dimonstr@iitp.ru
  organization: Smart Engines Service LLC,Moscow,Russia
– sequence: 4
  givenname: Vladimir V.
  surname: Arlazarov
  fullname: Arlazarov, Vladimir V.
  email: vva777@gmail.com
  organization: Federal Research Center "Computer Science and Control" of Russian Academy of Sciences,Moscow,Russia
BookMark eNotT1tLwzAYjaLgOv0FguQPtOb2pcljKW4O5hzDPY-kSSHSy2gj6L836M7LOXDgXDJ0M4yDR-iJkoJSop839f4AkktaMMJYoTVI4PwKZVRKEJpIJq7RgilO81KUcIeyef4khBEOaoGqlZkj7k2cwjfuv7oYzl1oTAzjgNtxwjYMZvrBZnA4-ulP17vdjJNdHd5wvT_eo9vWdLN_uPASHVcvH_Vrvn1fb-pqm4dUFXOXYKkyxGhpFVeNFaaVojUKwELjDGONEEmVCtIyKCkwqZkizDnlbMOX6PE_N3jvT-cp9GnM6XKX_wIukktU
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICPR56361.2022.9956533
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISBN 1665490624
9781665490627
EISSN 2831-7475
EndPage 3182
ExternalDocumentID 9956533
Genre orig-research
GrantInformation_xml – fundername: Russian Foundation for Basic Research
  funderid: 10.13039/501100002261
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i203t-ddddb18a0a96b838cb4af64fa855b5cda22c44b5c78535857152692802dd8dbc3
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000897707603026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu Jul 10 06:36:33 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-ddddb18a0a96b838cb4af64fa855b5cda22c44b5c78535857152692802dd8dbc3
PageCount 7
ParticipantIDs ieee_primary_9956533
PublicationCentury 2000
PublicationDate 2022-Aug.-21
PublicationDateYYYYMMDD 2022-08-21
PublicationDate_xml – month: 08
  year: 2022
  text: 2022-Aug.-21
  day: 21
PublicationDecade 2020
PublicationTitle International Conference on Pattern Recognition
PublicationTitleAbbrev ICPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020358
Score 2.2470539
Snippet Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and...
SourceID ieee
SourceType Publisher
StartPage 3176
SubjectTerms Computational efficiency
Inference algorithms
Libraries
Memory management
Mobile handsets
Neon
Neural networks
Title Fast matrix multiplication for binary and ternary CNNs on ARM CPU
URI https://ieeexplore.ieee.org/document/9956533
WOSCitedRecordID wos000897707603026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH9sw4NeptvEb3LwaLcmadr0OIpDQUsZTnYbL0kLO1hl68Q_36SrU8GLOT3yCCHvEd5H8nsP4JojxxCF9rjmzAuoKjwUUntUUc0UxXiLSnt-iNJUzudx1oKbHRYmz_P681k-dGT9lm9e9calykYOhWndkza0oyjcYrV2wZXPhWwQwNSPR_dJNhUhD10IyNiwWfmrhUptQSbd_-19CINvKB7JdkbmCFp52YPuVy8G0lzNHhz8KCzYh_EE1xV5ceX3P0jzZ7BJzhHrpRJVo3AJlobUCUFLJ2m6JpY9nj6SJJsNYDa5fUruvKZZgre0Z688Y4eiEn2MQyW51CrAIgwKlEIooQ0ypoPAUpE10DZGiKjrLc6kz4yRRml-DJ3ytcxPgASWWxiNRaSLQDMjrVMSF5xiFEgUDE-h7-SzeNvWw1g0ojn7e_oc9p0KXB6W0QvoVKtNfgl7-r1arldXtRI_AXYbnVs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFdRLta34NgePbrt57WaPslhabJdSWumt5LELPbiVPsSfb7Jdq4IXcxoyhJAZwjySbwbgnkoqA8m1RzUlHsMq8yQX2sMKa6KwjLaotJd-mCRiOo2GFXjYYWHSNC0-n6UtRxZv-WahNy5V1nYoTOue7ME-Z4z4W7TWLrzyKRclBhj7UbsXD0c8oIELAglplWt_NVEpbEin9r_dT6D5DcZDw52ZOYVKmteh9tWNAZWXsw7HP0oLNuCxI1dr9OoK8H-g8tdgmZ5D1k9FqsDhIpkbVKQELR0nyQpZ9uNogOLhpAmTztM47npluwRvbs--9owdCgvpyyhQggqtmMwClknBueLaSEI0Y5YKrYm2UUKIXXdxInxijDBK0zOo5os8PQfELDczWmahzpgmRli3JMooliETkhN5AQ0nn9nbtiLGrBTN5d_Td3DYHQ_6s34veb6CI6cOl5Ul-Bqq6-UmvYED_b6er5a3hUI_AdWyoKI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Pattern+Recognition&rft.atitle=Fast+matrix+multiplication+for+binary+and+ternary+CNNs+on+ARM+CPU&rft.au=Trusov%2C+Anton&rft.au=Limonova%2C+Elena&rft.au=Nikolaev%2C+Dmitry&rft.au=Arlazarov%2C+Vladimir+V.&rft.date=2022-08-21&rft.pub=IEEE&rft.eissn=2831-7475&rft.spage=3176&rft.epage=3182&rft_id=info:doi/10.1109%2FICPR56361.2022.9956533&rft.externalDocID=9956533