A high performance FPGA-based accelerator for large-scale convolutional neural networks
In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...
Gespeichert in:
| Veröffentlicht in: | International Conference on Field-programmable Logic and Applications S. 1 - 9 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
EPFL
01.08.2016
|
| Schlagworte: | |
| ISSN: | 1946-1488 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
|---|---|
| AbstractList | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
| Author | Wei Cao Xuegong Zhou Xitian Fan Li Jiao Huimin Li Lingli Wang |
| Author_xml | – sequence: 1 surname: Huimin Li fullname: Huimin Li organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 2 surname: Xitian Fan fullname: Xitian Fan organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 3 surname: Li Jiao fullname: Li Jiao organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 4 surname: Wei Cao fullname: Wei Cao email: caow@fudan.edu.cn organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 5 surname: Xuegong Zhou fullname: Xuegong Zhou organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 6 surname: Lingli Wang fullname: Lingli Wang organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China |
| BookMark | eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577308 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 9 |
| ExternalDocumentID | 7577308 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 167 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_7577308 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.0775757 |
| Snippet | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput |
| Title | A high performance FPGA-based accelerator for large-scale convolutional neural networks |
| URI | https://ieeexplore.ieee.org/document/7577308 |
| WOSCitedRecordID | wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09a8MwFHwkoUOntiSl32joWCWObX2NodTtUIKHlmYLkixBICQhH_391ZONQ6FLJwsbYTgJzta7uwfwyLxy1jBPtRE5zbVR1PgMvTIu5SJlmZc6NpsQ06mczVTZgafWC-Oci-IzN8RhrOVXa3vAo7KRYCJsSNmFrhCi9mq15ykJBlOxthKZqFFRvqN0iw-bab_6p0T6KM7-9-JzGBx9eKRsGeYCOm7Vh68JwYxhsjlK_klRvk4oElJFtLWBSWLxnITnZIlSb7oLS-EISsybraaXBKMs4yUKwXcD-CxePp7faNMegS4C7e6pCsDaivncpUxyTDUVmVcykz5hRhiRchugrgLfZ9z6RFZKe22cYGOdM0yRuoTear1yV0CYw1-ZDNPEwvdRmijN0eEacMiEF3x8DX0EZb6pEzDmDR43f9--hVPEvZbJ3UFvvz24ezix3_vFbvsQl-0HNOyZZA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGP2YU9CTyib-NgePZuuapmmOQ6wT5-hh4m4jTRMYjG1snX-_-bLSIXjx1NJSCu8LvDbfe-8DeORWGp1zS1UuIhqpXNLcMvTKmDAWIWc2UX7YhBiNkslEZg14qr0wxhgvPjMdPPW9_GKpt7hV1hVcuAWZHMAhj6Kwt3Nr1TsqAUZT8boXGchumg1RvBV3qgd_TVDxBJKe_u_VZ9DeO_FIVnPMOTTMogVffYIpw2S1F_2TNHvtU6SkgiitHZf49jlx98kcxd5044phCIrMq8Wm5gTDLP3BS8E3bfhMX8bPA1oNSKAzR7wllQ5aXXAbmZAnMeaaCmZlwhIb8FzkIoy1A7twjM9ibYOkkMqq3AjeUxHHHKkLaC6WC3MJhBv8mWGYJ-a-kMJAqhg9rg4HJqyIe1fQQlCmq10GxrTC4_rvyw9wPBh_DKfDt9H7DZxgDXaiuVtoluutuYMj_V3ONut7X8IfYHmcqw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308 |