A high performance FPGA-based accelerator for large-scale convolutional neural networks
In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...
Gespeichert in:
| Veröffentlicht in: | International Conference on Field-programmable Logic and Applications S. 1 - 9 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
EPFL
01.08.2016
|
| Schlagworte: | |
| ISSN: | 1946-1488 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
|---|---|
| AbstractList | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
| Author | Wei Cao Xuegong Zhou Xitian Fan Li Jiao Huimin Li Lingli Wang |
| Author_xml | – sequence: 1 surname: Huimin Li fullname: Huimin Li organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 2 surname: Xitian Fan fullname: Xitian Fan organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 3 surname: Li Jiao fullname: Li Jiao organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 4 surname: Wei Cao fullname: Wei Cao email: caow@fudan.edu.cn organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 5 surname: Xuegong Zhou fullname: Xuegong Zhou organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 6 surname: Lingli Wang fullname: Lingli Wang organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China |
| BookMark | eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577308 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 9 |
| ExternalDocumentID | 7577308 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 168 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_7577308 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.07748 |
| Snippet | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput |
| Title | A high performance FPGA-based accelerator for large-scale convolutional neural networks |
| URI | https://ieeexplore.ieee.org/document/7577308 |
| WOSCitedRecordID | wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODRtLvdzSY5FnH1IGUPir2VPCZQKG3pw99vJrtsEbx4SkgIgZmBL2S--YaQx7G13ucyYUYqz0JQOKZ8all46ro0LRJIvIvNJsR0KmczVXXIU1sLAwCRfAZDnMZcvlvbA36VjQQXISBll3SFEHWtVvufkqAwFW8zkYkaldU7UreKYXPsV_-UCB_l2f8uPieDYx0erVqEuSAdWPXJ14SixjDdHCn_tKxeJwwByVFtbUCSmDynYZ8ukerNdsEVQJFi3oSaXlKUsoxDJILvBuSzfPl4fmNNewS2CLC7Z0obZR33OYy5LFDVVGReyUz6hBthxLiwPPMu4H1WWJ9Ip7TXBgRPdc5RReqS9FbrFVwRqjmY3Dipw3siF5kykKdYEitkIQ2AuCZ9NMp8UytgzBt73Py9fEtO0e41Te6O9PbbA9yTE_u9X-y2D9FtPxqUmiM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61CnpSacW3OXg0bXY32STHIq4Va9lDxd5KnlAobenD32-SLlsEL54SEkIgM_ANmW--AeAx1do5wjFSXDjkncIg4RKNfKhrkiTHFjsTm02w4ZCPx6JsgKe6FsZaG8lnthOmMZdvFnobvsq6jDLvkPwAHFJC0mRXrVX_qOAgTUXrXCQW3aIcBPJW3qkO_uqgEgGkOP3f1Wegva_Eg2WNMeegYect8NWDQWUYLvekf1iUrz0UIMlAqbXHkpg-h34fzgLZG629MSwMJPPK2eQMBjHLOEQq-LoNPouX0XMfVQ0S0NQD7wYJqYQ21BGbUp4HXVOWOcEz7jBVTLE01zRzxiN-lmuHuRHSSWUZTSShQUfqAjTni7m9BFBSq4gyXPqIgrBMKEuSUBTLeM6VtewKtMKjTJY7DYxJ9R7Xfy8_gOP-6GMwGbwN32_ASbDBjjR3C5qb1dbegSP9vZmuV_fRhD9Om51q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308 |