A high performance FPGA-based accelerator for large-scale convolutional neural networks

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International Conference on Field-programmable Logic and Applications S. 1 - 9
Hauptverfasser:	Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou, Lingli Wang
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	EPFL 01.08.2016
Schlagworte:	AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput
ISSN:	1946-1488
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
AbstractList	In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.
Author	Wei Cao Xuegong Zhou Xitian Fan Li Jiao Huimin Li Lingli Wang
Author_xml	– sequence: 1 surname: Huimin Li fullname: Huimin Li organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 2 surname: Xitian Fan fullname: Xitian Fan organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 3 surname: Li Jiao fullname: Li Jiao organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 4 surname: Wei Cao fullname: Wei Cao email: caow@fudan.edu.cn organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 5 surname: Xuegong Zhou fullname: Xuegong Zhou organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 6 surname: Lingli Wang fullname: Lingli Wang organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China
BookMark	eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/FPL.2016.7577308
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9782839918442 2839918447
EISSN	1946-1488
EndPage	9
ExternalDocumentID	7577308
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
IEDL.DBID	RIE
ISICitedReferencesCount	168
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 01:40:02 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093
PageCount	9
ParticipantIDs	ieee_primary_7577308
PublicationCentury	2000
PublicationDate	2016-08
PublicationDateYYYYMMDD	2016-08-01
PublicationDate_xml	– month: 08 year: 2016 text: 2016-08
PublicationDecade	2010
PublicationTitle	International Conference on Field-programmable Logic and Applications
PublicationTitleAbbrev	FPL
PublicationYear	2016
Publisher	EPFL
Publisher_xml	– name: EPFL
SSID	ssj0000547856
Score	2.07748
Snippet	In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput
Title	A high performance FPGA-based accelerator for large-scale convolutional neural networks
URI	https://ieeexplore.ieee.org/document/7577308
WOSCitedRecordID	wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LagMhFJUkdNFVW5LSNy66rInzcNRlKJ12EcIs-sguOM4VAiEJefT763WGCYVuulIUEa7iQe85R0IeI2khEalhKItkKUQl04nzh2FmOK94ZExQ8X9O5HSqZjNddMhTq4UBgEA-gyFWQy6_WtsDPpWNpJB-Q6ou6Uopa61W-57C0ZhKtJlIrkd5MUHqVjZshv36PyXAR372v4nPyeCow6NFizAXpAOrPvkaU_QYppsj5Z_mxeuYISBV1FjrkSQkz6nvp0ukerOdXwqgSDFvtppZUrSyDEUggu8G5CN_eX9-Y833CGzhYXfPtCm1rYRLIRYqQ1dTmTitEuW4KGUp48yKxPl4iySzjqtKG2dKkCIyqUAXqUvSW61XcEUoGFeBjQX4-1LqAb5MIw4qTnRkdKYFXJM-BmW-qR0w5k08bv5uviWnGPeaJndHevvtAe7Jif3eL3bbh7BsP84YmXY
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA1zCnpS2cT5MwePZkvbpGmOQ6wT6-hh6m4jTb_AYGxjP_z7TbLSIXjxlJASAt8X-ki-914QegiEhogzRZwskjAICiIjY3-GsaK0pIFSXsX_mYnhMBmPZd5Aj7UWBgA8-Qy6rutr-eVCb91VWU9wYTdkcoAOOWNhsFNr1Tcq1FlT8boWSWUvzTNH3oq71cRfL6h4AElP_7f0GWrvlXg4rzHmHDVg3kJffexchvFyT_rHaf7SJw6SSqy0tljiy-fYfsczR_Yma5sMwI5kXm02NcPOzNI3ngq-bqOP9Hn0NCDVAwlkaoF3Q6QqpC65YRDyJHa-piIyMokSQ3khChHGmkfGRpxHsTY0KaUyqgDBA8W485G6QM35Yg6XCIMyJeiQgz0xMQvxBQsoJGEkAyVjyaGDWi4ok-XOA2NSxePq7-F7dDwYvWeT7HX4do1OXA52pLkb1NystnCLjvT3Zrpe3fkU_gD4xZy9
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308