A high performance FPGA-based accelerator for large-scale convolutional neural networks

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Conference on Field-programmable Logic and Applications s. 1 - 9
Hlavní autoři: Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou, Lingli Wang
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: EPFL 01.08.2016
Témata:
ISSN:1946-1488
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
AbstractList In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
Author Wei Cao
Xuegong Zhou
Xitian Fan
Li Jiao
Huimin Li
Lingli Wang
Author_xml – sequence: 1
  surname: Huimin Li
  fullname: Huimin Li
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 2
  surname: Xitian Fan
  fullname: Xitian Fan
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 3
  surname: Li Jiao
  fullname: Li Jiao
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 4
  surname: Wei Cao
  fullname: Wei Cao
  email: caow@fudan.edu.cn
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 5
  surname: Xuegong Zhou
  fullname: Xuegong Zhou
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 6
  surname: Lingli Wang
  fullname: Lingli Wang
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
BookMark eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/FPL.2016.7577308
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9782839918442
2839918447
EISSN 1946-1488
EndPage 9
ExternalDocumentID 7577308
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
IEDL.DBID RIE
ISICitedReferencesCount 167
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:40:02 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
PageCount 9
ParticipantIDs ieee_primary_7577308
PublicationCentury 2000
PublicationDate 2016-08
PublicationDateYYYYMMDD 2016-08-01
PublicationDate_xml – month: 08
  year: 2016
  text: 2016-08
PublicationDecade 2010
PublicationTitle International Conference on Field-programmable Logic and Applications
PublicationTitleAbbrev FPL
PublicationYear 2016
Publisher EPFL
Publisher_xml – name: EPFL
SSID ssj0000547856
Score 2.0775757
Snippet In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms AlexNet
Bandwidth
Computational modeling
Convolution
convolutional neural networks
memory bandwidth
Neurons
Parallel processing
parallelism
pipeline
System-on-chip
Throughput
Title A high performance FPGA-based accelerator for large-scale convolutional neural networks
URI https://ieeexplore.ieee.org/document/7577308
WOSCitedRecordID wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawMhEJYk9NBTW5LSNx56rHms644eQ-m2hxL20NLcgo8RAiEJefT3V82yodBLT4oiwozyofPNN4Q85laPcg_hpmUKWI5OMWksZ95LBwZ0eFGYVGwCJhM5naqqRZ6aXBhETOQz7MduiuW7ld3Hr7IBCAgHUrZJGwAOuVrNf8owClOJJhI5VIOyeo_UraJfL_tVPyXBR3n2v43PSe-Yh0erBmEuSAuXXfI1plFjmK6PlH9aVq9jFgHJUW1tQJIUPKdhni4i1ZttgyuQRop5fdT0gkYpy9QkIvi2Rz7Ll4_nN1aXR2DzALs7prRR1gmfYyZkEVVNgXslufRDYYKds8IK7l3Ae15YP5ROaa8NghjpXEQVqUvSWa6WeEWoF8i5RQ_ehDmBBlwAboNcGOcym12TbjTKbH1QwJjV9rj5e_iWnEa7H2hyd6Sz2-zxnpzY7918u3lIbvsBncKbQw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61CnpSacW3OXg0fWw2m-RYxLViLXuo2FvJYwKF0pY-_P0m6bJF8OIpISEEZhI-kvnmG4QeU6O6qeP-piWSkxSsJEIbSpwTlmuu_ItCx2ITfDgU47EsauipyoUBgEg-g1boxli-XZht-Cprc8b9gRQH6JCladLdZWtVPyqdIE3FqlhkR7bzYhDIW1mrXPirgkoEkPz0f1ufoeY-Ew8XFcacoxrMG-irh4PKMF7uSf84L157JECSxcoYjyUxfI79PJ4FsjdZe2cADiTz8rCpGQ5ilrGJVPB1E33mL6PnPikLJJCpB94NkUpLY5lLIWEiC7qmnDopqHAdpr2lk8ww6qxHfJoZ1xFWKqc0cNZVKQs6UheoPl_M4RJhx4BSA4477ecYaG49dGugTFubmOQKNYJRJsudBsaktMf138MP6Lg_-hhMBm_D9xt0EnywI83dovpmtYU7dGS-N9P16j668AffbZ6K
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308