A High-Performance CNN Processor Based on FPGA for MobileNets

Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwis...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Conference on Field-programmable Logic and Applications s. 136 - 143
Hlavní autoři: Wu, Di, Zhang, Yu, Jia, Xijie, Tian, Lu, Li, Tianping, Sui, Lingzhi, Xie, Dongliang, Shan, Yi
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.09.2019
Témata:
ISSN:1946-1488
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwise separable convolution to replace the standard convolution has significantly reduced operations and parameters with only limited loss in accuracy. A high-performance CNN processor based on FPGA is proposed in this paper. To improve the efficiency, two dedicated computing engines named Conv Engine and Dwcv Engine were designed for pointwise convolution and depthwise convolution respectively. The schedule for Conv Engine and Dwcv Engine has significantly improved the efficiency of our accelerator. Furthermore, we designed a special architecture called Channel Augmentation to improve the efficiency in the first layer of MobileNets. The accelerator can be flexibly deployed to various devices with different configurations to balance hardware resources and computational performance. We implemented our accelerator on ZU2 and ZU9 MPSoC FPGAs. The classification on ImageNet achieved 205.3 frames per second(fps) on ZU2 and 809.8 fps on ZU9, which is 15.4x speedup on ZU2 and 60.7x speedup on ZU9 compared to CPU. We also deployed MobileNet + SSD network on our accelerator for object detection, and achieved 31.0 fps on ZU2 and 124.3 fps on ZU9.
AbstractList Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwise separable convolution to replace the standard convolution has significantly reduced operations and parameters with only limited loss in accuracy. A high-performance CNN processor based on FPGA is proposed in this paper. To improve the efficiency, two dedicated computing engines named Conv Engine and Dwcv Engine were designed for pointwise convolution and depthwise convolution respectively. The schedule for Conv Engine and Dwcv Engine has significantly improved the efficiency of our accelerator. Furthermore, we designed a special architecture called Channel Augmentation to improve the efficiency in the first layer of MobileNets. The accelerator can be flexibly deployed to various devices with different configurations to balance hardware resources and computational performance. We implemented our accelerator on ZU2 and ZU9 MPSoC FPGAs. The classification on ImageNet achieved 205.3 frames per second(fps) on ZU2 and 809.8 fps on ZU9, which is 15.4x speedup on ZU2 and 60.7x speedup on ZU9 compared to CPU. We also deployed MobileNet + SSD network on our accelerator for object detection, and achieved 31.0 fps on ZU2 and 124.3 fps on ZU9.
Author Xie, Dongliang
Sui, Lingzhi
Li, Tianping
Shan, Yi
Zhang, Yu
Tian, Lu
Jia, Xijie
Wu, Di
Author_xml – sequence: 1
  givenname: Di
  surname: Wu
  fullname: Wu, Di
  organization: Xilinx Inc
– sequence: 2
  givenname: Yu
  surname: Zhang
  fullname: Zhang, Yu
  organization: Xilinx Inc
– sequence: 3
  givenname: Xijie
  surname: Jia
  fullname: Jia, Xijie
  organization: Xilinx Inc
– sequence: 4
  givenname: Lu
  surname: Tian
  fullname: Tian, Lu
  organization: Xilinx Inc, Tsinghua University
– sequence: 5
  givenname: Tianping
  surname: Li
  fullname: Li, Tianping
  organization: Xilinx Inc
– sequence: 6
  givenname: Lingzhi
  surname: Sui
  fullname: Sui, Lingzhi
  organization: Xilinx Inc
– sequence: 7
  givenname: Dongliang
  surname: Xie
  fullname: Xie, Dongliang
  organization: Xilinx Inc
– sequence: 8
  givenname: Yi
  surname: Shan
  fullname: Shan, Yi
  organization: Xilinx Inc
BookMark eNotjLFOwzAUAA0CiVKysrD4BxL8bCe2B4ZQNS1SCBlgrhznGYLaGNld-HuKYDqddLprcjGHGQm5BVYAMHPf9G3BGZiCMSbYGcmM0qC4Bqm1VOdkAUZW-a9dkSylz1PGSql0WS3IQ0230_tH3mP0IR7s7JCuuo72MThMKUT6aBOONMy06Tc1PUX0OQzTHjs8phty6e0-YfbPJXlr1q-rbd6-bJ5WdZtPvDLHfHDaorSaW-c5l96DR8mUHAcuy7Ky6IUYnHAGOI4OwAiNJYCVIJy1ioklufv7Toi4-4rTwcbvndYGjBHiB6NHSOY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/FPL.2019.00030
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781728148847
1728148847
EISSN 1946-1488
EndPage 143
ExternalDocumentID 8891993
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i269t-bc8ae4a82acf224ff1fe4074db24556aef33bc3c912edc11938e511a413caa703
IEDL.DBID RIE
ISICitedReferencesCount 107
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000518670300020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:43:29 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-bc8ae4a82acf224ff1fe4074db24556aef33bc3c912edc11938e511a413caa703
PageCount 8
ParticipantIDs ieee_primary_8891993
PublicationCentury 2000
PublicationDate 2019-09-01
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-09-01
  day: 01
PublicationDecade 2010
PublicationTitle International Conference on Field-programmable Logic and Applications
PublicationTitleAbbrev FPL
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000547856
Score 2.090716
Snippet Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural...
SourceID ieee
SourceType Publisher
StartPage 136
SubjectTerms Acceleration
Computational modeling
Convolution
convolution neural network
Engines
Field programmable gate arrays
FPGA
hardware accelerator
MobileNet
Schedules
Title A High-Performance CNN Processor Based on FPGA for MobileNets
URI https://ieeexplore.ieee.org/document/8891993
WOSCitedRecordID wos000518670300020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxLQ-MmMaJm9gDQ6kIDCXKAKhb5Y-z1KVBbcrvx06idmFhsyxL1tk-vbP93h3APVdS-aPAKTPKUR5nlmofp1MUnDmbCKUbLczXPCsKsVjIsgcPey0MIjbkM3wMzeYv31ZmF57KxkLIwDfrQz_LslartX9PiUJiqkna5WVkkRzn5TxQt0I-ypbkfKie0oBHfvK_aU9hdFDhkXKPL2fQw_UQnqYkUDNoeSD8k1lRkI7wX23Is8clS6o1ycvXKfGDyHulve8XWG9H8Jm_fMzeaFcCga7iVNZUG6GQKxEr4zzYOscc-isYtzrmk0mq0CWJNomRLEZrmI_GBPoQSnloMkp5bz6Hwbpa4wWQSIpMJ5FOWOpC5SktBRfWoGVaWuP0JQyD6cvvNsvFsrP66u_uazgOa9uyrW5gUG92eAtH5qdebTd3zdb8AmjUj2M
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKQYIJUIv4xgMjpnXipPbAABWhiDTKUFC3yh9nqUuC2pTfj51E7cLCZlmWrLN9emf7vTuE7pkU0h0FRqiWlrBgZIhycToBzqg1IZeq1sJ8paMs4_O5yDvoYauFAYCafAaPvln_5ZtSb_xT2YBz4flme2g_YiygjVpr-6Iy9KmporjNzEiHYpDkqSdv-YyUDc15Vz-lho_k-H8Tn6D-ToeH8y3CnKIOFD309Iw9OYPkO8o_HmcZbin_5Qq_OGQyuCxwkr89YzcIT0vlvD-Dat1Hn8nrbDwhbREEsgxiURGluQQmeSC1dXBrLbXgLmHMqIBFUSzBhqHSoRY0AKOpi8c4uCBKOnDSUjp_PkPdoizgHOGh4CMVDlVIY-trTynBGTcaDFXCaKsuUM-bvvhu8lwsWqsv_-6-Q4eT2TRdpO_ZxxU68uvccK-uUbdabeAGHeifarle3dbb9AsRL5Kq
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+High-Performance+CNN+Processor+Based+on+FPGA+for+MobileNets&rft.au=Wu%2C+Di&rft.au=Zhang%2C+Yu&rft.au=Jia%2C+Xijie&rft.au=Tian%2C+Lu&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=1946-1488&rft.spage=136&rft.epage=143&rft_id=info:doi/10.1109%2FFPL.2019.00030&rft.externalDocID=8891993