Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads

In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built bas...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) S. 145 - 158
Hauptverfasser: Abts, Dennis, Ross, Jonathan, Sparling, Jonathan, Wong-VanHaren, Mark, Baker, Max, Hawkins, Tom, Bell, Andrew, Thompson, John, Kahsai, Temesghen, Kimmell, Garrin, Hwang, Jennifer, Leslie-Hurd, Rebekah, Bye, Michael, Creswick, E.R., Boyd, Matthew, Venigalla, Mahitha, Laforge, Evan, Purdy, Jon, Kamath, Purushotham, Maheshwari, Dinesh, Beidler, Michael, Rosseel, Geert, Ahmad, Omar, Gagarin, Gleb, Czekalski, Richard, Rane, Ashay, Parmar, Sahil, Werner, Jeff, Sproch, Jim, Macias, Adrian, Kurtz, Brian
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.05.2020
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built based on two key observations: (1) machine learning workloads exhibit abundant data parallelism, which can be readily mapped to tensors in hardware, and (2) a simple and deterministic processor with producer-consumer stream programming model enables precise reasoning and control of hardware components, achieving good performance and power efficiency. The TSP is designed to exploit parallelism inherent in machine-learning workloads including instruction-level, memory concurrency, data and model parallelism, while guaranteeing determinism by eliminating all reactive elements in the hardware (e.g. arbiters, and caches). Early ResNet50 image classification results demonstrate 20.4K processed images per second (IPS) with a batch-size of one- a 4 \times improvement compared to other modern GPUs and accelerators [44]. Our first ASIC implementation of the TSP architecture yields a computational density of more than 1 TeraOp/s per square mm of silicon for its 25 \times 29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. The TSP demonstrates a novel hardware-software approach to achieve fast, yet predictable, performance on machine-learning workloads within a desired power envelope.
AbstractList In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built based on two key observations: (1) machine learning workloads exhibit abundant data parallelism, which can be readily mapped to tensors in hardware, and (2) a simple and deterministic processor with producer-consumer stream programming model enables precise reasoning and control of hardware components, achieving good performance and power efficiency. The TSP is designed to exploit parallelism inherent in machine-learning workloads including instruction-level, memory concurrency, data and model parallelism, while guaranteeing determinism by eliminating all reactive elements in the hardware (e.g. arbiters, and caches). Early ResNet50 image classification results demonstrate 20.4K processed images per second (IPS) with a batch-size of one- a 4 \times improvement compared to other modern GPUs and accelerators [44]. Our first ASIC implementation of the TSP architecture yields a computational density of more than 1 TeraOp/s per square mm of silicon for its 25 \times 29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. The TSP demonstrates a novel hardware-software approach to achieve fast, yet predictable, performance on machine-learning workloads within a desired power envelope.
Author Maheshwari, Dinesh
Baker, Max
Ahmad, Omar
Rosseel, Geert
Sparling, Jonathan
Kahsai, Temesghen
Abts, Dennis
Bye, Michael
Ross, Jonathan
Purdy, Jon
Rane, Ashay
Kurtz, Brian
Bell, Andrew
Thompson, John
Hawkins, Tom
Creswick, E.R.
Gagarin, Gleb
Boyd, Matthew
Sproch, Jim
Werner, Jeff
Venigalla, Mahitha
Laforge, Evan
Kimmell, Garrin
Kamath, Purushotham
Macias, Adrian
Parmar, Sahil
Beidler, Michael
Leslie-Hurd, Rebekah
Czekalski, Richard
Wong-VanHaren, Mark
Hwang, Jennifer
Author_xml – sequence: 1
  givenname: Dennis
  surname: Abts
  fullname: Abts, Dennis
  organization: Groq, Inc.. Mountain View,California
– sequence: 2
  givenname: Jonathan
  surname: Ross
  fullname: Ross, Jonathan
  organization: Groq, Inc.. Mountain View,California
– sequence: 3
  givenname: Jonathan
  surname: Sparling
  fullname: Sparling, Jonathan
  organization: Groq, Inc.. Mountain View,California
– sequence: 4
  givenname: Mark
  surname: Wong-VanHaren
  fullname: Wong-VanHaren, Mark
  organization: Groq, Inc.. Mountain View,California
– sequence: 5
  givenname: Max
  surname: Baker
  fullname: Baker, Max
  organization: Groq, Inc.. Mountain View,California
– sequence: 6
  givenname: Tom
  surname: Hawkins
  fullname: Hawkins, Tom
  organization: Groq, Inc.. Mountain View,California
– sequence: 7
  givenname: Andrew
  surname: Bell
  fullname: Bell, Andrew
  organization: Groq, Inc.. Mountain View,California
– sequence: 8
  givenname: John
  surname: Thompson
  fullname: Thompson, John
  organization: Groq, Inc.. Mountain View,California
– sequence: 9
  givenname: Temesghen
  surname: Kahsai
  fullname: Kahsai, Temesghen
  organization: Groq, Inc.. Mountain View,California
– sequence: 10
  givenname: Garrin
  surname: Kimmell
  fullname: Kimmell, Garrin
  organization: Groq, Inc.. Mountain View,California
– sequence: 11
  givenname: Jennifer
  surname: Hwang
  fullname: Hwang, Jennifer
  organization: Groq, Inc.. Mountain View,California
– sequence: 12
  givenname: Rebekah
  surname: Leslie-Hurd
  fullname: Leslie-Hurd, Rebekah
  organization: Groq, Inc.. Mountain View,California
– sequence: 13
  givenname: Michael
  surname: Bye
  fullname: Bye, Michael
  organization: Groq, Inc.. Mountain View,California
– sequence: 14
  givenname: E.R.
  surname: Creswick
  fullname: Creswick, E.R.
  organization: Groq, Inc.. Mountain View,California
– sequence: 15
  givenname: Matthew
  surname: Boyd
  fullname: Boyd, Matthew
  organization: Groq, Inc.. Mountain View,California
– sequence: 16
  givenname: Mahitha
  surname: Venigalla
  fullname: Venigalla, Mahitha
  organization: Groq, Inc.. Mountain View,California
– sequence: 17
  givenname: Evan
  surname: Laforge
  fullname: Laforge, Evan
  organization: Groq, Inc.. Mountain View,California
– sequence: 18
  givenname: Jon
  surname: Purdy
  fullname: Purdy, Jon
  organization: Groq, Inc.. Mountain View,California
– sequence: 19
  givenname: Purushotham
  surname: Kamath
  fullname: Kamath, Purushotham
  organization: Groq, Inc.. Mountain View,California
– sequence: 20
  givenname: Dinesh
  surname: Maheshwari
  fullname: Maheshwari, Dinesh
  organization: Groq, Inc.. Mountain View,California
– sequence: 21
  givenname: Michael
  surname: Beidler
  fullname: Beidler, Michael
  organization: Groq, Inc.. Mountain View,California
– sequence: 22
  givenname: Geert
  surname: Rosseel
  fullname: Rosseel, Geert
  organization: Groq, Inc.. Mountain View,California
– sequence: 23
  givenname: Omar
  surname: Ahmad
  fullname: Ahmad, Omar
  organization: Groq, Inc.. Mountain View,California
– sequence: 24
  givenname: Gleb
  surname: Gagarin
  fullname: Gagarin, Gleb
  organization: Groq, Inc.. Mountain View,California
– sequence: 25
  givenname: Richard
  surname: Czekalski
  fullname: Czekalski, Richard
  organization: Groq, Inc.. Mountain View,California
– sequence: 26
  givenname: Ashay
  surname: Rane
  fullname: Rane, Ashay
  organization: Groq, Inc.. Mountain View,California
– sequence: 27
  givenname: Sahil
  surname: Parmar
  fullname: Parmar, Sahil
  organization: Groq, Inc.. Mountain View,California
– sequence: 28
  givenname: Jeff
  surname: Werner
  fullname: Werner, Jeff
  organization: Groq, Inc.. Mountain View,California
– sequence: 29
  givenname: Jim
  surname: Sproch
  fullname: Sproch, Jim
  organization: Groq, Inc.. Mountain View,California
– sequence: 30
  givenname: Adrian
  surname: Macias
  fullname: Macias, Adrian
  organization: Groq, Inc.. Mountain View,California
– sequence: 31
  givenname: Brian
  surname: Kurtz
  fullname: Kurtz, Brian
  organization: Groq, Inc.. Mountain View,California
BookMark eNotjE1Lw0AURUdQ0Nb-Al3MUheJM_MyX-5CarUQsJBIl2WSvGhompSZbPz3GnR177kc7oJcDuOAhNxzFnPO7NO2yNJEKqtjwQSLGWMCLsiCa2F4ohSX12QVQlexhEswgvMbsi-_uuFINy5MzzSlJQ5h9LSYPLpTN3zSnR9rDPP2UBa7R9r-trSusUfvpllYI55pjs4PM-1Hf-xH14RbctW6PuDqP5fkY_NSZm9R_v66zdI8csLIKYIGEtCQ2FpXWlohHWtaDohSVaZuqpa1qnFQKWFAaZtYCdoZq13dKmmkhCW5-_vtEPFw9t3J-e-D5WCsUfADY2ZQtg
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA45697.2020.00023
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728146615
9781728146614
EndPage 158
ExternalDocumentID 9138986
Genre orig-research
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIO
ID FETCH-LOGICAL-a285t-3d3437349c7b75925a0df13ee56b8cdbf0f6da3b628367949537a897acf658553
IEDL.DBID RIE
ISICitedReferencesCount 62
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000617734800012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 06 17:54:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a285t-3d3437349c7b75925a0df13ee56b8cdbf0f6da3b628367949537a897acf658553
PageCount 14
ParticipantIDs ieee_primary_9138986
PublicationCentury 2000
PublicationDate 2020-May
PublicationDateYYYYMMDD 2020-05-01
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-May
PublicationDecade 2020
PublicationTitle 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib041538211
Score 2.4524536
Snippet In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with...
SourceID ieee
SourceType Publisher
StartPage 145
SubjectTerms Computer architecture
Data models
Deep learning
Hardware
Microarchitecture
Parallel processing
System-on-chip
Tensors
Transistors
Vectors
Title Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads
URI https://ieeexplore.ieee.org/document/9138986
WOSCitedRecordID wos000617734800012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1t8eBJpRW_ycGDgrG7m80m8VaqRUFKoRV7K9lkVkTdLe3W3-9ku1YPXryFEAhMMpl5mXkzhJwbhXgrySSTXHAWmzBiWkjDDDeRwdcwsqoiCj_K4VBNp3rUIFcbLgwAVMlncO2HVSzfFXblv8q62kfVVNIkTSmTNVfr--7EXnMRzNTsuDDQ3Ydxv4fugZaIAiOfwBX4nkS_eqhUJmSw87_Nd0nnh4tHRxsrs0cakLfJs2-3-UYHZlne0B6dIBYtFtRHmM0HrqJ1-j_OXUzGo0uKnintWYsmxh84LrgFmNO6tuoL9T_m74Vxyw55GtxN-vesbpHATKREybjjvjZRrK1MpdCRMIHLQg4gklRZl2ZBljjD0wS9iARVTwsujdLS2AxdDyH4PmnlRQ4HhFqETvgyQho4GesQdKghtIDwTGax5fEhaXuhzObrKhizWh5Hf08fk20v9XVq4AlplYsVnJIt-1m-Lhdn1dF9ASu6mJ8
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7mFPRJZRN_mwcfFKxrm6ZpfBvTseEcg03c20jTq4i6jq3z7_dS6_TBF99KCARyTe6-3H33AZzriPBWmEpHcsGdQHu-o4TUjuba13Qb-iYqiMI92e9H47EaVOBqxYVBxKL4DK_tZ5HLTzKztE9lDWWzalG4ButWOatka33_PYE9uwRnSn6c56pGd9hqUoCgJOFA35ZwuVaV6JeKSuFE2tv_W34H6j9sPDZY-ZldqOC0Bk9WcPOVtfUiv2FNNiI0ms2ZzTHrd5rFSgIAjV2MhoNLRrEpaxpDTsaanCbcIs5Y2V31mdk387dMJ4s6PLbvRq2OU4okONqPRO7whNvuRIEyMpZC-UK7SepxRBHGkUni1E3DRPM4pDgipMOnBJc6UlKblIIPIfgeVKfZFPeBGQJPdDdi7CYyUB4qT6FnkACaTAPDgwOo2U2ZzL76YEzK_Tj8e_gMNjujh96k1-3fH8GWtcBXoeAxVPP5Ek9gw3zkL4v5aWHGT1GWm-g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+ACM%2FIEEE+47th+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=Think+Fast%3A+A+Tensor+Streaming+Processor+%28TSP%29+for+Accelerating+Deep+Learning+Workloads&rft.au=Abts%2C+Dennis&rft.au=Ross%2C+Jonathan&rft.au=Sparling%2C+Jonathan&rft.au=Wong-VanHaren%2C+Mark&rft.date=2020-05-01&rft.pub=IEEE&rft.spage=145&rft.epage=158&rft_id=info:doi/10.1109%2FISCA45697.2020.00023&rft.externalDocID=9138986