Fast matrix multiplication for binary and ternary CNNs on ARM CPU
Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and ac...
Gespeichert in:
| Veröffentlicht in: | International Conference on Pattern Recognition S. 3176 - 3182 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
21.08.2022
|
| Schlagworte: | |
| ISSN: | 2831-7475 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and activation and can be computed using Boolean logic and bit count operations. QNNs with ternary weights and activations (TNNs) and binary weights and ternary activations (TBNs) aim to improve recognition quality compared to BNNs while preserving low bit-width. However, their efficient implementation is usually considered on ASICs and FPGAs, limiting their applicability in real-life tasks. At the same time, one of the areas where efficient recognition is most in demand is recognition on mobile devices using their CPUs. However, there are no known fast implementations of TBNs and TNN, only the daBNN library for BNNs inference. In this paper, we propose novel fast algorithms of ternary, ternary-binary, and binary matrix multiplication for mobile devices with ARM architecture. In our algorithms, ternary weights are represented using 2-bit encoding and binary - using one bit. It allows us to replace matrix multiplication with Boolean logic operations that can be computed on 128-bits simultaneously, using ARM NEON SIMD extension. The matrix multiplication results are accumulated in 16-bit integer registers. We also use special reordering of values in left and right matrices. All that allows us to efficiently compute a matrix product while minimizing the number of loads and stores compared to the algorithm from daBNN. Our algorithms can be used to implement inference of convolutional and fully connected layers of TNNs, TBNs, and BNNs. We evaluate them experimentally on ARM Cortex-A73 CPU and compare their inference speed to efficient implementations of full-precision, 8-bit, and 4-bit quantized matrix multiplications. Our experiment shows our implementations of ternary and ternary-binary matrix multiplications to have almost the same inference time, and they are 3.6 times faster than full-precision, 2.5 times faster than 8-bit quantized, and 1.4 times faster than 4-bit quantized matrix multiplication but 2.9 slower than binary matrix multiplication. |
|---|---|
| AbstractList | Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and computational resources. Binary neural networks (BNNs) are memory and computationally efficient as they require only one bit per weight and activation and can be computed using Boolean logic and bit count operations. QNNs with ternary weights and activations (TNNs) and binary weights and ternary activations (TBNs) aim to improve recognition quality compared to BNNs while preserving low bit-width. However, their efficient implementation is usually considered on ASICs and FPGAs, limiting their applicability in real-life tasks. At the same time, one of the areas where efficient recognition is most in demand is recognition on mobile devices using their CPUs. However, there are no known fast implementations of TBNs and TNN, only the daBNN library for BNNs inference. In this paper, we propose novel fast algorithms of ternary, ternary-binary, and binary matrix multiplication for mobile devices with ARM architecture. In our algorithms, ternary weights are represented using 2-bit encoding and binary - using one bit. It allows us to replace matrix multiplication with Boolean logic operations that can be computed on 128-bits simultaneously, using ARM NEON SIMD extension. The matrix multiplication results are accumulated in 16-bit integer registers. We also use special reordering of values in left and right matrices. All that allows us to efficiently compute a matrix product while minimizing the number of loads and stores compared to the algorithm from daBNN. Our algorithms can be used to implement inference of convolutional and fully connected layers of TNNs, TBNs, and BNNs. We evaluate them experimentally on ARM Cortex-A73 CPU and compare their inference speed to efficient implementations of full-precision, 8-bit, and 4-bit quantized matrix multiplications. Our experiment shows our implementations of ternary and ternary-binary matrix multiplications to have almost the same inference time, and they are 3.6 times faster than full-precision, 2.5 times faster than 8-bit quantized, and 1.4 times faster than 4-bit quantized matrix multiplication but 2.9 slower than binary matrix multiplication. |
| Author | Limonova, Elena Nikolaev, Dmitry Trusov, Anton Arlazarov, Vladimir V. |
| Author_xml | – sequence: 1 givenname: Anton surname: Trusov fullname: Trusov, Anton email: trusov.av@smartengines.ru organization: Moscow Institute of Physics and Technology,Dolgoprudny,Russia – sequence: 2 givenname: Elena surname: Limonova fullname: Limonova, Elena email: limonova@smartengines.com organization: Federal Research Center "Computer Science and Control" of Russian Academy of Sciences,Moscow,Russia – sequence: 3 givenname: Dmitry surname: Nikolaev fullname: Nikolaev, Dmitry email: dimonstr@iitp.ru organization: Smart Engines Service LLC,Moscow,Russia – sequence: 4 givenname: Vladimir V. surname: Arlazarov fullname: Arlazarov, Vladimir V. email: vva777@gmail.com organization: Federal Research Center "Computer Science and Control" of Russian Academy of Sciences,Moscow,Russia |
| BookMark | eNotT1tLwzAYjaLgOv0FguQPtOb2pcljKW4O5hzDPY-kSSHSy2gj6L836M7LOXDgXDJ0M4yDR-iJkoJSop839f4AkktaMMJYoTVI4PwKZVRKEJpIJq7RgilO81KUcIeyef4khBEOaoGqlZkj7k2cwjfuv7oYzl1oTAzjgNtxwjYMZvrBZnA4-ulP17vdjJNdHd5wvT_eo9vWdLN_uPASHVcvH_Vrvn1fb-pqm4dUFXOXYKkyxGhpFVeNFaaVojUKwELjDGONEEmVCtIyKCkwqZkizDnlbMOX6PE_N3jvT-cp9GnM6XKX_wIukktU |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICPR56361.2022.9956533 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISBN | 1665490624 9781665490627 |
| EISSN | 2831-7475 |
| EndPage | 3182 |
| ExternalDocumentID | 9956533 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Russian Foundation for Basic Research funderid: 10.13039/501100002261 |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i203t-ddddb18a0a96b838cb4af64fa855b5cda22c44b5c78535857152692802dd8dbc3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000897707603026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jul 10 06:36:33 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-ddddb18a0a96b838cb4af64fa855b5cda22c44b5c78535857152692802dd8dbc3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_9956533 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-Aug.-21 |
| PublicationDateYYYYMMDD | 2022-08-21 |
| PublicationDate_xml | – month: 08 year: 2022 text: 2022-Aug.-21 day: 21 |
| PublicationDecade | 2020 |
| PublicationTitle | International Conference on Pattern Recognition |
| PublicationTitleAbbrev | ICPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020358 |
| Score | 2.2470539 |
| Snippet | Low-bit quantized neural networks (QNNs) are of great interest in practical applications because they significantly reduce the consumption of both memory and... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 3176 |
| SubjectTerms | Computational efficiency Inference algorithms Libraries Memory management Mobile handsets Neon Neural networks |
| Title | Fast matrix multiplication for binary and ternary CNNs on ARM CPU |
| URI | https://ieeexplore.ieee.org/document/9956533 |
| WOSCitedRecordID | wos000897707603026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH9sw4NeptvEb3LwaLcmadr0OIpDQUsZTnYbL0kLO1hl68Q_36SrU8GLOT3yCCHvEd5H8nsP4JojxxCF9rjmzAuoKjwUUntUUc0UxXiLSnt-iNJUzudx1oKbHRYmz_P681k-dGT9lm9e9calykYOhWndkza0oyjcYrV2wZXPhWwQwNSPR_dJNhUhD10IyNiwWfmrhUptQSbd_-19CINvKB7JdkbmCFp52YPuVy8G0lzNHhz8KCzYh_EE1xV5ceX3P0jzZ7BJzhHrpRJVo3AJlobUCUFLJ2m6JpY9nj6SJJsNYDa5fUruvKZZgre0Z688Y4eiEn2MQyW51CrAIgwKlEIooQ0ypoPAUpE10DZGiKjrLc6kz4yRRml-DJ3ytcxPgASWWxiNRaSLQDMjrVMSF5xiFEgUDE-h7-SzeNvWw1g0ojn7e_oc9p0KXB6W0QvoVKtNfgl7-r1arldXtRI_AXYbnVs |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFdRLta34NgePbrt57WaPslhabJdSWumt5LELPbiVPsSfb7Jdq4IXcxoyhJAZwjySbwbgnkoqA8m1RzUlHsMq8yQX2sMKa6KwjLaotJd-mCRiOo2GFXjYYWHSNC0-n6UtRxZv-WahNy5V1nYoTOue7ME-Z4z4W7TWLrzyKRclBhj7UbsXD0c8oIELAglplWt_NVEpbEin9r_dT6D5DcZDw52ZOYVKmteh9tWNAZWXsw7HP0oLNuCxI1dr9OoK8H-g8tdgmZ5D1k9FqsDhIpkbVKQELR0nyQpZ9uNogOLhpAmTztM47npluwRvbs--9owdCgvpyyhQggqtmMwClknBueLaSEI0Y5YKrYm2UUKIXXdxInxijDBK0zOo5os8PQfELDczWmahzpgmRli3JMooliETkhN5AQ0nn9nbtiLGrBTN5d_Td3DYHQ_6s34veb6CI6cOl5Ul-Bqq6-UmvYED_b6er5a3hUI_AdWyoKI |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Pattern+Recognition&rft.atitle=Fast+matrix+multiplication+for+binary+and+ternary+CNNs+on+ARM+CPU&rft.au=Trusov%2C+Anton&rft.au=Limonova%2C+Elena&rft.au=Nikolaev%2C+Dmitry&rft.au=Arlazarov%2C+Vladimir+V.&rft.date=2022-08-21&rft.pub=IEEE&rft.eissn=2831-7475&rft.spage=3176&rft.epage=3182&rft_id=info:doi/10.1109%2FICPR56361.2022.9956533&rft.externalDocID=9956533 |