Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA
Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have al...
Saved in:
| Published in: | International Conference on Field-programmable Logic and Applications pp. 1 - 8 |
|---|---|
| Main Authors: | , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
EPFL
01.08.2016
|
| Subjects: | |
| ISSN: | 1946-1488 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have allowed for fast design optimization, hardware inefficiency still exists when allocating FPGA resources to maximize parallelism and throughput. A direct hardware-level design (i.e., RTL) can improve the efficiency and achieve greater acceleration. However, this requires an in-depth understanding of both the algorithm structure and the FPGA system architecture. In this work, we present a scalable solution that integrates the flexibility of high-level synthesis and the finer level optimization of an RTL implementation. The cornerstone is a compiler that analyzes the CNN structure and parameters, and automatically generates a set of modular and scalable computing primitives that can accelerate various deep learning algorithms. Integrating these modules together for end-to-end CNN implementations, this work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints. The proposed methodology is demonstrated on Altera Stratix-V GXA7 FPGA for AlexNet and NIN CNN models, achieving 114.5 GOPS and 117.3 GOPS, respectively. This represents a 1.9× improvement in throughput when compared to the OpenCL-based design. The results illustrate the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning. |
|---|---|
| AbstractList | Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have allowed for fast design optimization, hardware inefficiency still exists when allocating FPGA resources to maximize parallelism and throughput. A direct hardware-level design (i.e., RTL) can improve the efficiency and achieve greater acceleration. However, this requires an in-depth understanding of both the algorithm structure and the FPGA system architecture. In this work, we present a scalable solution that integrates the flexibility of high-level synthesis and the finer level optimization of an RTL implementation. The cornerstone is a compiler that analyzes the CNN structure and parameters, and automatically generates a set of modular and scalable computing primitives that can accelerate various deep learning algorithms. Integrating these modules together for end-to-end CNN implementations, this work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints. The proposed methodology is demonstrated on Altera Stratix-V GXA7 FPGA for AlexNet and NIN CNN models, achieving 114.5 GOPS and 117.3 GOPS, respectively. This represents a 1.9× improvement in throughput when compared to the OpenCL-based design. The results illustrate the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning. |
| Author | Suda, Naveen Vrudhula, Sarma Yufei Ma Yu Cao Jae-sun Seo |
| Author_xml | – sequence: 1 surname: Yufei Ma fullname: Yufei Ma email: yufeima@asu.edu organization: Sch. of Electr., Comput. & Energy Eng., Arizona State Univ., Tempe, AZ, USA – sequence: 2 givenname: Naveen surname: Suda fullname: Suda, Naveen email: nsuda@asu.edu organization: Sch. of Electr., Comput. & Energy Eng., Arizona State Univ., Tempe, AZ, USA – sequence: 3 surname: Yu Cao fullname: Yu Cao email: yu.cao@asu.edu organization: Sch. of Electr., Comput. & Energy Eng., Arizona State Univ., Tempe, AZ, USA – sequence: 4 surname: Jae-sun Seo fullname: Jae-sun Seo email: jaesun.seo@asu.edu organization: Sch. of Electr., Comput. & Energy Eng., Arizona State Univ., Tempe, AZ, USA – sequence: 5 givenname: Sarma surname: Vrudhula fullname: Vrudhula, Sarma email: vrudhula@asu.edu organization: Sch. of Comput., Inf., Decision Syst. Eng., Arizona State Univ., Tempe, AZ, USA |
| BookMark | eNotkLtOwzAYRg0CiVKyI7H4BRLs2PFlrCJSkCKoaJkr2_ktBZy4ygUET8-tZzk6yzd8l-isjz0gdE1JRinRt9WmznJCRSYLKVkhTlCipcoV05oqzvNTtKCai5RypS5QMo6v5IeCS1WIBdpunQnGBsCmb3AXmzmYof2CBj_vauxid2iDmdrY4-hxGfv3GObfNAE_wjz8afqIw9uIYz9FXG3Wqyt07k0YITl6iV6qu115n9ZP64dyVadtzumUeicM5OAEWK-5JUILa3xBtfVcKPDWWWKASQXcNYQJxzwHlVNDiPdSM7ZEN_-7LQDsD0PbmeFzf7yBfQNGN1O0 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577356 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 8 |
| ExternalDocumentID | 7577356 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-fc6ae2ec6ebf94b0696baf519bf468efbcb0ae378e4cd036c3f4e821a00ff7933 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 100 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000386610400058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-fc6ae2ec6ebf94b0696baf519bf468efbcb0ae378e4cd036c3f4e821a00ff7933 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_7577356 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.065519 |
| Snippet | Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Acceleration Algorithm design and analysis Convolution Convolutional neural networks Field programmable gate arrays FPGA Hardware hardware acceleration Kernel Memory management |
| Title | Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA |
| URI | https://ieeexplore.ieee.org/document/7577356 |
| WOSCitedRecordID | wos000386610400058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA_b8OBJZRO_ycGj3dKmTZqjiNXDGMVN2W3k4wUK2sq-Dv71Jm2pCF48JQSSwEvC7_Hye7-H0K1QIZUm0QEBzoNY0iiQEuIgBJqqhEod1Yo3b1M-m6XLpch76K7LhQGAmnwGY9-t__JNpXc-VDbhCec0YX3U55w1uVpdPIV4Yaqk-4kkYpLlU0_dYuN22q_6KTV8ZEf_2_gYjX7y8HDeIcwJ6kE5RPO5M6tPeMKyNPijMp5IWnyBwS-LKfYU8aLht-HKYrfQvr1c8h17JY66qanfG-y1C3CWP92P0Gv2uHh4DtrSCEHhIHcbWM0kRKAZKCtiRZhgSlrnjSkbsxSs0opIoDyFWBsHUpraGNIolIRY654kPUWDsirhDGGREBURLmyUmJg6dALKTOQcL-dppDZMz9HQG2T12ahfrFpbXPw9fIkOvc0bitwVGmzXO7hGB3q_LTbrm_rIvgHAN5jq |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_Bot6xpk_YoYp1YR3FTdhv5eIGCtrKvg3-9SVcqghdPCYEk8JLwe7z83u8hdB3LARU6VB4Bzr1AUN8TAgJvADSSIRXKrxRv3lI-GkXTaZy10E2TCwMAFfkMeq5b_eXrUq1cqKzPQ85pyLbQtqucVWdrNREV4qSpwuYvksT9JEsdeYv16om_KqhUAJLs_2_rA9T9ycTDWYMxh6gFRQeNx9awLuUJi0Ljj1I7Kmn-BRq_TFLsSOL5huGGS4PtQuv6eol37LQ4qqYify-wUy_ASfZw20Wvyf3kbujVxRG83ILu0jOKCfBBMZAmDiRhMZPCWH9MmoBFYKSSRADlEQRKW5hS1AQQ-QNBiDH2UdIj1C7KAo4RjkMifcJj44c6oBafgDLtW9fL-hqRGUQnqOMMMvvc6F_Maluc_j18hXaHk-d0lj6Ons7QnrP_hjB3jtrL-Qou0I5aL_PF_LI6vm_ngZwz |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=Scalable+and+modularized+RTL+compilation+of+Convolutional+Neural+Networks+onto+FPGA&rft.au=Yufei+Ma&rft.au=Suda%2C+Naveen&rft.au=Yu+Cao&rft.au=Jae-sun+Seo&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1109%2FFPL.2016.7577356&rft.externalDocID=7577356 |