DaDianNao: A Machine-Learning Supercomputer

Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CN...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2014 47th Annual IEEE/ACM International Symposium on Microarchitecture S. 609 - 622
Hauptverfasser: Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, Temam, Olivier
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.12.2014
Schlagworte:
ISSN:1072-4451
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.
AbstractList Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.
Author Ninghui Sun
Ling Li
Tao Luo
Jia Wang
Temam, Olivier
Shijin Zhang
Yunji Chen
Zhiwei Xu
Shaoli Liu
Liqiang He
Tianshi Chen
Author_xml – sequence: 1
  surname: Yunji Chen
  fullname: Yunji Chen
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 2
  surname: Tao Luo
  fullname: Tao Luo
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 3
  surname: Shaoli Liu
  fullname: Shaoli Liu
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 4
  surname: Shijin Zhang
  fullname: Shijin Zhang
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 5
  surname: Liqiang He
  fullname: Liqiang He
  organization: Inria, Scalay, France
– sequence: 6
  surname: Jia Wang
  fullname: Jia Wang
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 7
  surname: Ling Li
  fullname: Ling Li
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 8
  surname: Tianshi Chen
  fullname: Tianshi Chen
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 9
  surname: Zhiwei Xu
  fullname: Zhiwei Xu
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 10
  surname: Ninghui Sun
  fullname: Ninghui Sun
  organization: SKL of Comput. Archit., ICT, Beijing, China
– sequence: 11
  givenname: Olivier
  surname: Temam
  fullname: Temam, Olivier
  organization: Inria, Scalay, France
BookMark eNotzDtPwzAQAGAjFYm2dGRiyY4S7uyLL2arWh6VUirxmCvHuUAQdaKkHfj3DDB92zdTk9hFUeoKIUMEd7vdrF52mQakLC_O1AyJnbPOFXqipgisU6IcL9RiHL8AAK0lJDNVN2u_bn189t1dsky2Pny2UdJS_BDb-JG8nnoZQnfoT0cZLtV5479HWfw7V-8P92-rp7TcPW5WyzL1mswxtVgbMbXjGgUDBQnGcuObwjNVlrFiNqjJVg3oKnBugRrHEKjQtQPDZq6u_95WRPb90B788LNnQCSN5heOlEGb
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MICRO.2014.58
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479969982
9781479969982
EndPage 622
ExternalDocumentID 7011421
Genre orig-research
GroupedDBID -~X
123
29O
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-a243t-61d3e3d97d1e1c4cec367faf8a74b671b7731246bf02bc75604f970c482d90373
IEDL.DBID RIE
ISICitedReferencesCount 1016
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000365531100049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1072-4451
IngestDate Wed Aug 27 01:52:26 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a243t-61d3e3d97d1e1c4cec367faf8a74b671b7731246bf02bc75604f970c482d90373
PageCount 14
ParticipantIDs ieee_primary_7011421
PublicationCentury 2000
PublicationDate 2014-Dec.
PublicationDateYYYYMMDD 2014-12-01
PublicationDate_xml – month: 12
  year: 2014
  text: 2014-Dec.
PublicationDecade 2010
PublicationTitle 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
PublicationTitleAbbrev MICRO
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001664143
ssj0008695
Score 2.5378475
Snippet Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of...
SourceID ieee
SourceType Publisher
StartPage 609
SubjectTerms accelerator
Bandwidth
Biological neural networks
Computer architecture
Graphics processing units
Hardware
Kernel
machine learning
neural network
Neurons
Title DaDianNao: A Machine-Learning Supercomputer
URI https://ieeexplore.ieee.org/document/7011421
WOSCitedRecordID wos000365531100049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECa18eCpamt8Zw8epd0CyyzejNp40Nr4SHpreAzGS7eprb9foGvrwYs3QkLCMIHhg5nvI-TCg1JOI6NKqJIKow01RhfUC2Ucj4w3PvHMPsBwWI7HatQgl-taGERMyWfYjc30l-8qu4xPZT1IlZ8B62wByFWt1uY9RUqRqOrqU7iUSXEloBtGIwnXhl-zF-x7fopZXaIbld5_qaqkoDJo_W86u6Szqc7LRuu4s0caON0nrR95hqzerW0SPHobnD_U1VV2nT2mrEmkNaHqe_aynOHc1qM65G1w93pzT2ttBKqZ4IuA-BxH7hS4PvatsGi5BK99qUEYCX0DwEPolsbnzFgI9xrhFeRWlMypnAM_IM1pNcVDkhVeGBYwsuEQ4ALmxgpkaIsyXJUK9PaItKPtk9mK_mJSm338d_cJ2Ykru8r4OCXNxXyJZ2Tbfi0-PufnyWffwN-Tig
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0QNNETKhi_3YNHC0vb3W69GZVghJUoJtxI250aL0AQ_P22ZQUPXrw1TZp0Ommnr515D-DKCikLhZRILjPCtdJEa5UQy6UumGe8sYFntifyPBuN5KAC1-taGEQMyWfY9M3wl19MzdI_lbVEqPx0WGcr4ZzGq2qtzYtKmvJAVleew1kaNFccvqHE03BtGDZbzsKXZ5_XxZte6_2XrkoIK53a_ya0B41NfV40WEeefajg5ABqPwINUblf6-B8eu_cn6vpTXQb9UPeJJKSUvU9el3OcG7KUQ146zwM77qkVEcginK2cJivYMgKKYo2tg03aFgqrLKZElynoq2FYC54p9rGVBvhbjbcShEbntFCxkywQ6hOphM8giixXFOHkjUTDjBgrA1HiibJ3GUpQWuOoe5tH89WBBjj0uyTv7svYac77PfGvcf86RR2_Sqv8j_OoLqYL_Ects3X4uNzfhH89w2lWZbR
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2014+47th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=DaDianNao%3A+A+Machine-Learning+Supercomputer&rft.au=Yunji+Chen&rft.au=Tao+Luo&rft.au=Shaoli+Liu&rft.au=Shijin+Zhang&rft.date=2014-12-01&rft.pub=IEEE&rft.issn=1072-4451&rft.spage=609&rft.epage=622&rft_id=info:doi/10.1109%2FMICRO.2014.58&rft.externalDocID=7011421
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1072-4451&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1072-4451&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1072-4451&client=summon