A high performance FPGA-based accelerator for large-scale convolutional neural networks
In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...
Uloženo v:
| Vydáno v: | International Conference on Field-programmable Logic and Applications s. 1 - 9 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
EPFL
01.08.2016
|
| Témata: | |
| ISSN: | 1946-1488 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
|---|---|
| AbstractList | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
| Author | Wei Cao Xuegong Zhou Xitian Fan Li Jiao Huimin Li Lingli Wang |
| Author_xml | – sequence: 1 surname: Huimin Li fullname: Huimin Li organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 2 surname: Xitian Fan fullname: Xitian Fan organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 3 surname: Li Jiao fullname: Li Jiao organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 4 surname: Wei Cao fullname: Wei Cao email: caow@fudan.edu.cn organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 5 surname: Xuegong Zhou fullname: Xuegong Zhou organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 6 surname: Lingli Wang fullname: Lingli Wang organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China |
| BookMark | eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577308 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 9 |
| ExternalDocumentID | 7577308 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 167 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_7577308 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.0775757 |
| Snippet | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput |
| Title | A high performance FPGA-based accelerator for large-scale convolutional neural networks |
| URI | https://ieeexplore.ieee.org/document/7577308 |
| WOSCitedRecordID | wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawMhEJYk9NBTW5LSNx56rHms644eQ-m2hxL20NLcgo8RAiEJefT3V82yodBLT4oiwozyofPNN4Q85laPcg_hpmUKWI5OMWksZ95LBwZ0eFGYVGwCJhM5naqqRZ6aXBhETOQz7MduiuW7ld3Hr7IBCAgHUrZJGwAOuVrNf8owClOJJhI5VIOyeo_UraJfL_tVPyXBR3n2v43PSe-Yh0erBmEuSAuXXfI1plFjmK6PlH9aVq9jFgHJUW1tQJIUPKdhni4i1ZttgyuQRop5fdT0gkYpy9QkIvi2Rz7Ll4_nN1aXR2DzALs7prRR1gmfYyZkEVVNgXslufRDYYKds8IK7l3Ae15YP5ROaa8NghjpXEQVqUvSWa6WeEWoF8i5RQ_ehDmBBlwAboNcGOcym12TbjTKbH1QwJjV9rj5e_iWnEa7H2hyd6Sz2-zxnpzY7918u3lIbvsBncKbQw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61CnpSacW3OXg0fWw2m-RYxLViLXuo2FvJYwKF0pY-_P0m6bJF8OIpISEEZhI-kvnmG4QeU6O6qeP-piWSkxSsJEIbSpwTlmuu_ItCx2ITfDgU47EsauipyoUBgEg-g1boxli-XZht-Cprc8b9gRQH6JCladLdZWtVPyqdIE3FqlhkR7bzYhDIW1mrXPirgkoEkPz0f1ufoeY-Ew8XFcacoxrMG-irh4PKMF7uSf84L157JECSxcoYjyUxfI79PJ4FsjdZe2cADiTz8rCpGQ5ilrGJVPB1E33mL6PnPikLJJCpB94NkUpLY5lLIWEiC7qmnDopqHAdpr2lk8ww6qxHfJoZ1xFWKqc0cNZVKQs6UheoPl_M4RJhx4BSA4477ecYaG49dGugTFubmOQKNYJRJsudBsaktMf138MP6Lg_-hhMBm_D9xt0EnywI83dovpmtYU7dGS-N9P16j668AffbZ6K |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308 |