DaDianNao: A Machine-Learning Supercomputer
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CN...
Gespeichert in:
| Veröffentlicht in: | 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture S. 609 - 622 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.12.2014
|
| Schlagworte: | |
| ISSN: | 1072-4451 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects. |
|---|---|
| AbstractList | Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects. |
| Author | Ninghui Sun Ling Li Tao Luo Jia Wang Temam, Olivier Shijin Zhang Yunji Chen Zhiwei Xu Shaoli Liu Liqiang He Tianshi Chen |
| Author_xml | – sequence: 1 surname: Yunji Chen fullname: Yunji Chen organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 2 surname: Tao Luo fullname: Tao Luo organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 3 surname: Shaoli Liu fullname: Shaoli Liu organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 4 surname: Shijin Zhang fullname: Shijin Zhang organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 5 surname: Liqiang He fullname: Liqiang He organization: Inria, Scalay, France – sequence: 6 surname: Jia Wang fullname: Jia Wang organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 7 surname: Ling Li fullname: Ling Li organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 8 surname: Tianshi Chen fullname: Tianshi Chen organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 9 surname: Zhiwei Xu fullname: Zhiwei Xu organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 10 surname: Ninghui Sun fullname: Ninghui Sun organization: SKL of Comput. Archit., ICT, Beijing, China – sequence: 11 givenname: Olivier surname: Temam fullname: Temam, Olivier organization: Inria, Scalay, France |
| BookMark | eNotzDtPwzAQAGAjFYm2dGRiyY4S7uyLL2arWh6VUirxmCvHuUAQdaKkHfj3DDB92zdTk9hFUeoKIUMEd7vdrF52mQakLC_O1AyJnbPOFXqipgisU6IcL9RiHL8AAK0lJDNVN2u_bn189t1dsky2Pny2UdJS_BDb-JG8nnoZQnfoT0cZLtV5479HWfw7V-8P92-rp7TcPW5WyzL1mswxtVgbMbXjGgUDBQnGcuObwjNVlrFiNqjJVg3oKnBugRrHEKjQtQPDZq6u_95WRPb90B788LNnQCSN5heOlEGb |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/MICRO.2014.58 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1479969982 9781479969982 |
| EndPage | 622 |
| ExternalDocumentID | 7011421 |
| Genre | orig-research |
| GroupedDBID | -~X 123 29O 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-a243t-61d3e3d97d1e1c4cec367faf8a74b671b7731246bf02bc75604f970c482d90373 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1016 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000365531100049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1072-4451 |
| IngestDate | Wed Aug 27 01:52:26 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a243t-61d3e3d97d1e1c4cec367faf8a74b671b7731246bf02bc75604f970c482d90373 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_7011421 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-Dec. |
| PublicationDateYYYYMMDD | 2014-12-01 |
| PublicationDate_xml | – month: 12 year: 2014 text: 2014-Dec. |
| PublicationDecade | 2010 |
| PublicationTitle | 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture |
| PublicationTitleAbbrev | MICRO |
| PublicationYear | 2014 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001664143 ssj0008695 |
| Score | 2.5378475 |
| Snippet | Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 609 |
| SubjectTerms | accelerator Bandwidth Biological neural networks Computer architecture Graphics processing units Hardware Kernel machine learning neural network Neurons |
| Title | DaDianNao: A Machine-Learning Supercomputer |
| URI | https://ieeexplore.ieee.org/document/7011421 |
| WOSCitedRecordID | wos000365531100049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECa18eCpamt8Zw8epd0CyyzejNp40Nr4SHpreAzGS7eprb9foGvrwYs3QkLCMIHhg5nvI-TCg1JOI6NKqJIKow01RhfUC2Ucj4w3PvHMPsBwWI7HatQgl-taGERMyWfYjc30l-8qu4xPZT1IlZ8B62wByFWt1uY9RUqRqOrqU7iUSXEloBtGIwnXhl-zF-x7fopZXaIbld5_qaqkoDJo_W86u6Szqc7LRuu4s0caON0nrR95hqzerW0SPHobnD_U1VV2nT2mrEmkNaHqe_aynOHc1qM65G1w93pzT2ttBKqZ4IuA-BxH7hS4PvatsGi5BK99qUEYCX0DwEPolsbnzFgI9xrhFeRWlMypnAM_IM1pNcVDkhVeGBYwsuEQ4ALmxgpkaIsyXJUK9PaItKPtk9mK_mJSm338d_cJ2Ykru8r4OCXNxXyJZ2Tbfi0-PufnyWffwN-Tig |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0QNNETKhi_3YNHC0vb3W69GZVghJUoJtxI250aL0AQ_P22ZQUPXrw1TZp0Ommnr515D-DKCikLhZRILjPCtdJEa5UQy6UumGe8sYFntifyPBuN5KAC1-taGEQMyWfY9M3wl19MzdI_lbVEqPx0WGcr4ZzGq2qtzYtKmvJAVleew1kaNFccvqHE03BtGDZbzsKXZ5_XxZte6_2XrkoIK53a_ya0B41NfV40WEeefajg5ABqPwINUblf6-B8eu_cn6vpTXQb9UPeJJKSUvU9el3OcG7KUQ146zwM77qkVEcginK2cJivYMgKKYo2tg03aFgqrLKZElynoq2FYC54p9rGVBvhbjbcShEbntFCxkywQ6hOphM8giixXFOHkjUTDjBgrA1HiibJ3GUpQWuOoe5tH89WBBjj0uyTv7svYac77PfGvcf86RR2_Sqv8j_OoLqYL_Ects3X4uNzfhH89w2lWZbR |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2014+47th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=DaDianNao%3A+A+Machine-Learning+Supercomputer&rft.au=Yunji+Chen&rft.au=Tao+Luo&rft.au=Shaoli+Liu&rft.au=Shijin+Zhang&rft.date=2014-12-01&rft.pub=IEEE&rft.issn=1072-4451&rft.spage=609&rft.epage=622&rft_id=info:doi/10.1109%2FMICRO.2014.58&rft.externalDocID=7011421 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1072-4451&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1072-4451&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1072-4451&client=summon |