High-performance video content recognition with long-term recurrent convolutional network for FPGA
FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing lar...
Uložené v:
| Vydané v: | International Conference on Field-programmable Logic and Applications s. 1 - 4 |
|---|---|
| Hlavní autori: | , , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
Ghent University
01.09.2017
|
| Predmet: | |
| ISSN: | 1946-1488 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image. |
|---|---|
| AbstractList | FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image. |
| Author | Deming Chen Peng Ouyang Ramachandran, Anand Xiaofan Zhang Zuofu Cheng Shibin Tang Rupnow, Kyle Chuanhao Zhuge Xinheng Liu |
| Author_xml | – sequence: 1 surname: Xiaofan Zhang fullname: Xiaofan Zhang – sequence: 2 surname: Xinheng Liu fullname: Xinheng Liu – sequence: 3 givenname: Anand surname: Ramachandran fullname: Ramachandran, Anand – sequence: 4 surname: Chuanhao Zhuge fullname: Chuanhao Zhuge – sequence: 5 surname: Shibin Tang fullname: Shibin Tang – sequence: 6 surname: Peng Ouyang fullname: Peng Ouyang – sequence: 7 surname: Zuofu Cheng fullname: Zuofu Cheng – sequence: 8 givenname: Kyle surname: Rupnow fullname: Rupnow, Kyle – sequence: 9 surname: Deming Chen fullname: Deming Chen |
| BookMark | eNotkM1OAjEcxKvRREAewHjpCyz2Y7vbHgkRMNlEDnom3e6_S3VpSSkQ357dyFwmmfllDjNGDz54QOiFkhnjiqq35aaaMULLmSSikJzfobEiinCSM8nu0YiqvMhoLuUTmh6PP6SXyEspihGq167dZQeINsS99gbw2TUQsAk-gU84ggmtd8kFjy8u7XAXfJsliPuhOsU4QD18Dt1pgHSHPaRLiL-4X8TLzWr-jB6t7o4wvfkEfS_fvxbrrPpcfSzmVeZoKVKmC1HnpqaCWs0NIRJqU9dGGaXA5lZS0E2fGkqMYrQQmlhmWSlL1jQGCOUT9Pq_6wBge4hur-Pf9nYJvwI7Vll7 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.23919/FPL.2017.8056833 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9090304282 9789090304281 |
| EISSN | 1946-1488 |
| EndPage | 4 |
| ExternalDocumentID | 8056833 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i175t-a65b4cb151fa3c008ebcbbc9c99ef4f81ead008c10c92165a0f2f27872ddce013 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 26 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000426989400077&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:28:39 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-a65b4cb151fa3c008ebcbbc9c99ef4f81ead008c10c92165a0f2f27872ddce013 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_8056833 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Sept. |
| PublicationDateYYYYMMDD | 2017-09-01 |
| PublicationDate_xml | – month: 09 year: 2017 text: 2017-Sept. |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2017 |
| Publisher | Ghent University |
| Publisher_xml | – name: Ghent University |
| SSID | ssj0000547856 |
| Score | 1.9950753 |
| Snippet | FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Field programmable gate arrays IP networks Mathematical model Neural networks Optimization Quantization (signal) Resource management |
| Title | High-performance video content recognition with long-term recurrent convolutional network for FPGA |
| URI | https://ieeexplore.ieee.org/document/8056833 |
| WOSCitedRecordID | wos000426989400077&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07a8MwED6S0KFTW5LSNxo6VoltOZY0llK3QwgeWsgW9CyBYIc0ye-vzjYOhS7dxOkF0knfcfruBPDImWdWx5J6lzKaJiqmgumIou2BDizBXJ0yf8bnc7FYyKIHT10sjHOuJp-5MRbrt3xbmT26yiYioLVgrA99znkTq9X5UyJMTDXNmofLhMlYTvJihtwtPm77_fpApcaP_Ox_M5_D6BiIR4oOYi6g58ohaORm0M2R8U8wlq4iyDoPo5COE1SVBN2sZF2VXxSvYKxq8jFh40OrdWpNyoYMTsKIJC_enkfwmb9-vLzT9qsEugr4v6Mqm-rU6ADfXjETcN1po7WRRkrnUy_ioDBBauLIyCTOpiryiU_CYU2sNS6YgZcwKKvSXQEJ9qBKM6eVTVyaWauk4poxnalICsP4NQxxfZabJhvGsl2am7_Ft3CKW9Cwsu5gsNvu3T2cmMNu9b19qLfwBz1on-M |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH_MKehJZRO_zcGj2doka5ujiHViHT1M2G0kaSoDacfc9veb15YOwYu38PIFyUt-j5ffewG4D3nOM-1LmlvBqWDKpxHXHkXbAx1YEbdVyvwknEyi2UymHXhoY2GstRX5zA6wWL3lZ6XZoKtsGDm0jjjfg_2REMyvo7Vaj4qHqalGQf10ybj05TBOE2RvhYOm568vVCoEiY__N_cJ9HeheCRtQeYUOrbogUZ2Bl3uOP8Eo-lKgrxzNwppWUFlQdDRSr7K4pPiJYxVdUYmbLxt9E59kaKmgxM3IonTl8c-fMTP06cxbT5LoAtnAaypCkZaGO0APFfcOGS32mhtpJHS5iKPfKcyTmp8z0jmByPl5Sxn7riyLDPWGYJn0C3Kwp4DcRahEoHVKmNWBFmmpAo15zpQnowMDy-gh-szX9b5MObN0lz-Lb6Dw_H0PZknr5O3KzjC7ag5WtfQXa829gYOzHa9-F7dVtv5A7i1oyo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=High-performance+video+content+recognition+with+long-term+recurrent+convolutional+network+for+FPGA&rft.au=Xiaofan+Zhang&rft.au=Xinheng+Liu&rft.au=Ramachandran%2C+Anand&rft.au=Chuanhao+Zhuge&rft.date=2017-09-01&rft.pub=Ghent+University&rft.eissn=1946-1488&rft.spage=1&rft.epage=4&rft_id=info:doi/10.23919%2FFPL.2017.8056833&rft.externalDocID=8056833 |