A high performance FPGA-based accelerator for large-scale convolutional neural networks

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International Conference on Field-programmable Logic and Applications S. 1 - 9
Hauptverfasser: Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou, Lingli Wang
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: EPFL 01.08.2016
Schlagworte:
ISSN:1946-1488
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
AbstractList In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
Author Wei Cao
Xuegong Zhou
Xitian Fan
Li Jiao
Huimin Li
Lingli Wang
Author_xml – sequence: 1
  surname: Huimin Li
  fullname: Huimin Li
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 2
  surname: Xitian Fan
  fullname: Xitian Fan
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 3
  surname: Li Jiao
  fullname: Li Jiao
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 4
  surname: Wei Cao
  fullname: Wei Cao
  email: caow@fudan.edu.cn
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 5
  surname: Xuegong Zhou
  fullname: Xuegong Zhou
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
– sequence: 6
  surname: Lingli Wang
  fullname: Lingli Wang
  organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
BookMark eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/FPL.2016.7577308
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9782839918442
2839918447
EISSN 1946-1488
EndPage 9
ExternalDocumentID 7577308
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
IEDL.DBID RIE
ISICitedReferencesCount 168
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:40:02 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
PageCount 9
ParticipantIDs ieee_primary_7577308
PublicationCentury 2000
PublicationDate 2016-08
PublicationDateYYYYMMDD 2016-08-01
PublicationDate_xml – month: 08
  year: 2016
  text: 2016-08
PublicationDecade 2010
PublicationTitle International Conference on Field-programmable Logic and Applications
PublicationTitleAbbrev FPL
PublicationYear 2016
Publisher EPFL
Publisher_xml – name: EPFL
SSID ssj0000547856
Score 2.07748
Snippet In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms AlexNet
Bandwidth
Computational modeling
Convolution
convolutional neural networks
memory bandwidth
Neurons
Parallel processing
parallelism
pipeline
System-on-chip
Throughput
Title A high performance FPGA-based accelerator for large-scale convolutional neural networks
URI https://ieeexplore.ieee.org/document/7577308
WOSCitedRecordID wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODRtLvdzSY5FnH1IGUPir2VPCZQKG3pw99vJrtsEbx4SkgIgZmBL2S--YaQx7G13ucyYUYqz0JQOKZ8all46ro0LRJIvIvNJsR0KmczVXXIU1sLAwCRfAZDnMZcvlvbA36VjQQXISBll3SFEHWtVvufkqAwFW8zkYkaldU7UreKYXPsV_-UCB_l2f8uPieDYx0erVqEuSAdWPXJ14SixjDdHCn_tKxeJwwByVFtbUCSmDynYZ8ukerNdsEVQJFi3oSaXlKUsoxDJILvBuSzfPl4fmNNewS2CLC7Z0obZR33OYy5LFDVVGReyUz6hBthxLiwPPMu4H1WWJ9Ip7TXBgRPdc5RReqS9FbrFVwRqjmY3Dipw3siF5kykKdYEitkIQ2AuCZ9NMp8UytgzBt73Py9fEtO0e41Te6O9PbbA9yTE_u9X-y2D9FtPxqUmiM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61CnpSacW3OXg0bXY32STHIq4Va9lDxd5KnlAobenD32-SLlsEL54SEkIgM_ANmW--AeAx1do5wjFSXDjkncIg4RKNfKhrkiTHFjsTm02w4ZCPx6JsgKe6FsZaG8lnthOmMZdvFnobvsq6jDLvkPwAHFJC0mRXrVX_qOAgTUXrXCQW3aIcBPJW3qkO_uqgEgGkOP3f1Wegva_Eg2WNMeegYect8NWDQWUYLvekf1iUrz0UIMlAqbXHkpg-h34fzgLZG629MSwMJPPK2eQMBjHLOEQq-LoNPouX0XMfVQ0S0NQD7wYJqYQ21BGbUp4HXVOWOcEz7jBVTLE01zRzxiN-lmuHuRHSSWUZTSShQUfqAjTni7m9BFBSq4gyXPqIgrBMKEuSUBTLeM6VtewKtMKjTJY7DYxJ9R7Xfy8_gOP-6GMwGbwN32_ASbDBjjR3C5qb1dbegSP9vZmuV_fRhD9Om51q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308