A high performance FPGA-based accelerator for large-scale convolutional neural networks
In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementat...
Gespeichert in:
| Veröffentlicht in: | International Conference on Field-programmable Logic and Applications S. 1 - 9 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
EPFL
01.08.2016
|
| Schlagworte: | |
| ISSN: | 1946-1488 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
|---|---|
| AbstractList | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches. |
| Author | Wei Cao Xuegong Zhou Xitian Fan Li Jiao Huimin Li Lingli Wang |
| Author_xml | – sequence: 1 surname: Huimin Li fullname: Huimin Li organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 2 surname: Xitian Fan fullname: Xitian Fan organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 3 surname: Li Jiao fullname: Li Jiao organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 4 surname: Wei Cao fullname: Wei Cao email: caow@fudan.edu.cn organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 5 surname: Xuegong Zhou fullname: Xuegong Zhou organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China – sequence: 6 surname: Lingli Wang fullname: Lingli Wang organization: State Key Lab. of ASIC & Syst., Fudan Univ., Shanghai, China |
| BookMark | eNotkE1Lw0AURUdRsNbsBTfzBxLnMzOzLMVUoWAXisvyMnnTRtNMmaSK_96qvZuzuHDh3Gty0cceCbnlrOCcuftqtSwE42VhtDGS2TOSOWOFlc5xq5Q4JxPuVJlzZe0VyYbhnR2jlbG6nJC3Gd22my3dYwox7aD3SKvVYpbXMGBDwXvsMMEYEz32tIO0wXzw0CH1sf-M3WFsYw8d7fGQ_jB-xfQx3JDLAN2A2YlT8lo9vMwf8-Xz4mk-W-atUHzMHdTONzooFNqWWvwqBGelDUzXpjai9FqGhnEtSx-YbRwEqNFoDkof7eWU3P3vtoi43qd2B-l7fbpC_gBXcVPt |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577308 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 9 |
| ExternalDocumentID | 7577308 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 168 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-9ab9cd5f4e2586525773f9838f05b7b726c53fd01536cf08d9afabe751a451093 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_7577308 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.07748 |
| Snippet | In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | AlexNet Bandwidth Computational modeling Convolution convolutional neural networks memory bandwidth Neurons Parallel processing parallelism pipeline System-on-chip Throughput |
| Title | A high performance FPGA-based accelerator for large-scale convolutional neural networks |
| URI | https://ieeexplore.ieee.org/document/7577308 |
| WOSCitedRecordID | wos000386610400010&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LagMhFJUkdNFVW5LSNy66rInzcNRlKJ12EcIs-sguOM4VAiEJefT763WGCYVuulIUEa7iQe85R0IeI2khEalhKItkKUQl04nzh2FmOK94ZExQ8X9O5HSqZjNddMhTq4UBgEA-gyFWQy6_WtsDPpWNpJB-Q6ou6Uopa61W-57C0ZhKtJlIrkd5MUHqVjZshv36PyXAR372v4nPyeCow6NFizAXpAOrPvkaU_QYppsj5Z_mxeuYISBV1FjrkSQkz6nvp0ukerOdXwqgSDFvtppZUrSyDEUggu8G5CN_eX9-Y833CGzhYXfPtCm1rYRLIRYqQ1dTmTitEuW4KGUp48yKxPl4iySzjqtKG2dKkCIyqUAXqUvSW61XcEUoGFeBjQX4-1LqAb5MIw4qTnRkdKYFXJM-BmW-qR0w5k08bv5uviWnGPeaJndHevvtAe7Jif3eL3bbh7BsP84YmXY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA1zCnpS2cT5MwePZkvbpGmOQ6wT6-hh6m4jTb_AYGxjP_z7TbLSIXjxlJASAt8X-ki-914QegiEhogzRZwskjAICiIjY3-GsaK0pIFSXsX_mYnhMBmPZd5Aj7UWBgA8-Qy6rutr-eVCb91VWU9wYTdkcoAOOWNhsFNr1Tcq1FlT8boWSWUvzTNH3oq71cRfL6h4AElP_7f0GWrvlXg4rzHmHDVg3kJffexchvFyT_rHaf7SJw6SSqy0tljiy-fYfsczR_Yma5sMwI5kXm02NcPOzNI3ngq-bqOP9Hn0NCDVAwlkaoF3Q6QqpC65YRDyJHa-piIyMokSQ3khChHGmkfGRpxHsTY0KaUyqgDBA8W485G6QM35Yg6XCIMyJeiQgz0xMQvxBQsoJGEkAyVjyaGDWi4ok-XOA2NSxePq7-F7dDwYvWeT7HX4do1OXA52pLkb1NystnCLjvT3Zrpe3fkU_gD4xZy9 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=A+high+performance+FPGA-based+accelerator+for+large-scale+convolutional+neural+networks&rft.au=Huimin+Li&rft.au=Xitian+Fan&rft.au=Li+Jiao&rft.au=Wei+Cao&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FFPL.2016.7577308&rft.externalDocID=7577308 |