Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads
In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built bas...
Gespeichert in:
| Veröffentlicht in: | 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) S. 145 - 158 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.05.2020
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built based on two key observations: (1) machine learning workloads exhibit abundant data parallelism, which can be readily mapped to tensors in hardware, and (2) a simple and deterministic processor with producer-consumer stream programming model enables precise reasoning and control of hardware components, achieving good performance and power efficiency. The TSP is designed to exploit parallelism inherent in machine-learning workloads including instruction-level, memory concurrency, data and model parallelism, while guaranteeing determinism by eliminating all reactive elements in the hardware (e.g. arbiters, and caches). Early ResNet50 image classification results demonstrate 20.4K processed images per second (IPS) with a batch-size of one- a 4 \times improvement compared to other modern GPUs and accelerators [44]. Our first ASIC implementation of the TSP architecture yields a computational density of more than 1 TeraOp/s per square mm of silicon for its 25 \times 29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. The TSP demonstrates a novel hardware-software approach to achieve fast, yet predictable, performance on machine-learning workloads within a desired power envelope. |
|---|---|
| AbstractList | In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units in order to take advantage of dataflow locality of deep learning operations. The TSP is built based on two key observations: (1) machine learning workloads exhibit abundant data parallelism, which can be readily mapped to tensors in hardware, and (2) a simple and deterministic processor with producer-consumer stream programming model enables precise reasoning and control of hardware components, achieving good performance and power efficiency. The TSP is designed to exploit parallelism inherent in machine-learning workloads including instruction-level, memory concurrency, data and model parallelism, while guaranteeing determinism by eliminating all reactive elements in the hardware (e.g. arbiters, and caches). Early ResNet50 image classification results demonstrate 20.4K processed images per second (IPS) with a batch-size of one- a 4 \times improvement compared to other modern GPUs and accelerators [44]. Our first ASIC implementation of the TSP architecture yields a computational density of more than 1 TeraOp/s per square mm of silicon for its 25 \times 29 mm 14nm chip operating at a nominal clock frequency of 900 MHz. The TSP demonstrates a novel hardware-software approach to achieve fast, yet predictable, performance on machine-learning workloads within a desired power envelope. |
| Author | Maheshwari, Dinesh Baker, Max Ahmad, Omar Rosseel, Geert Sparling, Jonathan Kahsai, Temesghen Abts, Dennis Bye, Michael Ross, Jonathan Purdy, Jon Rane, Ashay Kurtz, Brian Bell, Andrew Thompson, John Hawkins, Tom Creswick, E.R. Gagarin, Gleb Boyd, Matthew Sproch, Jim Werner, Jeff Venigalla, Mahitha Laforge, Evan Kimmell, Garrin Kamath, Purushotham Macias, Adrian Parmar, Sahil Beidler, Michael Leslie-Hurd, Rebekah Czekalski, Richard Wong-VanHaren, Mark Hwang, Jennifer |
| Author_xml | – sequence: 1 givenname: Dennis surname: Abts fullname: Abts, Dennis organization: Groq, Inc.. Mountain View,California – sequence: 2 givenname: Jonathan surname: Ross fullname: Ross, Jonathan organization: Groq, Inc.. Mountain View,California – sequence: 3 givenname: Jonathan surname: Sparling fullname: Sparling, Jonathan organization: Groq, Inc.. Mountain View,California – sequence: 4 givenname: Mark surname: Wong-VanHaren fullname: Wong-VanHaren, Mark organization: Groq, Inc.. Mountain View,California – sequence: 5 givenname: Max surname: Baker fullname: Baker, Max organization: Groq, Inc.. Mountain View,California – sequence: 6 givenname: Tom surname: Hawkins fullname: Hawkins, Tom organization: Groq, Inc.. Mountain View,California – sequence: 7 givenname: Andrew surname: Bell fullname: Bell, Andrew organization: Groq, Inc.. Mountain View,California – sequence: 8 givenname: John surname: Thompson fullname: Thompson, John organization: Groq, Inc.. Mountain View,California – sequence: 9 givenname: Temesghen surname: Kahsai fullname: Kahsai, Temesghen organization: Groq, Inc.. Mountain View,California – sequence: 10 givenname: Garrin surname: Kimmell fullname: Kimmell, Garrin organization: Groq, Inc.. Mountain View,California – sequence: 11 givenname: Jennifer surname: Hwang fullname: Hwang, Jennifer organization: Groq, Inc.. Mountain View,California – sequence: 12 givenname: Rebekah surname: Leslie-Hurd fullname: Leslie-Hurd, Rebekah organization: Groq, Inc.. Mountain View,California – sequence: 13 givenname: Michael surname: Bye fullname: Bye, Michael organization: Groq, Inc.. Mountain View,California – sequence: 14 givenname: E.R. surname: Creswick fullname: Creswick, E.R. organization: Groq, Inc.. Mountain View,California – sequence: 15 givenname: Matthew surname: Boyd fullname: Boyd, Matthew organization: Groq, Inc.. Mountain View,California – sequence: 16 givenname: Mahitha surname: Venigalla fullname: Venigalla, Mahitha organization: Groq, Inc.. Mountain View,California – sequence: 17 givenname: Evan surname: Laforge fullname: Laforge, Evan organization: Groq, Inc.. Mountain View,California – sequence: 18 givenname: Jon surname: Purdy fullname: Purdy, Jon organization: Groq, Inc.. Mountain View,California – sequence: 19 givenname: Purushotham surname: Kamath fullname: Kamath, Purushotham organization: Groq, Inc.. Mountain View,California – sequence: 20 givenname: Dinesh surname: Maheshwari fullname: Maheshwari, Dinesh organization: Groq, Inc.. Mountain View,California – sequence: 21 givenname: Michael surname: Beidler fullname: Beidler, Michael organization: Groq, Inc.. Mountain View,California – sequence: 22 givenname: Geert surname: Rosseel fullname: Rosseel, Geert organization: Groq, Inc.. Mountain View,California – sequence: 23 givenname: Omar surname: Ahmad fullname: Ahmad, Omar organization: Groq, Inc.. Mountain View,California – sequence: 24 givenname: Gleb surname: Gagarin fullname: Gagarin, Gleb organization: Groq, Inc.. Mountain View,California – sequence: 25 givenname: Richard surname: Czekalski fullname: Czekalski, Richard organization: Groq, Inc.. Mountain View,California – sequence: 26 givenname: Ashay surname: Rane fullname: Rane, Ashay organization: Groq, Inc.. Mountain View,California – sequence: 27 givenname: Sahil surname: Parmar fullname: Parmar, Sahil organization: Groq, Inc.. Mountain View,California – sequence: 28 givenname: Jeff surname: Werner fullname: Werner, Jeff organization: Groq, Inc.. Mountain View,California – sequence: 29 givenname: Jim surname: Sproch fullname: Sproch, Jim organization: Groq, Inc.. Mountain View,California – sequence: 30 givenname: Adrian surname: Macias fullname: Macias, Adrian organization: Groq, Inc.. Mountain View,California – sequence: 31 givenname: Brian surname: Kurtz fullname: Kurtz, Brian organization: Groq, Inc.. Mountain View,California |
| BookMark | eNotjE1Lw0AURUdQ0Nb-Al3MUheJM_MyX-5CarUQsJBIl2WSvGhompSZbPz3GnR177kc7oJcDuOAhNxzFnPO7NO2yNJEKqtjwQSLGWMCLsiCa2F4ohSX12QVQlexhEswgvMbsi-_uuFINy5MzzSlJQ5h9LSYPLpTN3zSnR9rDPP2UBa7R9r-trSusUfvpllYI55pjs4PM-1Hf-xH14RbctW6PuDqP5fkY_NSZm9R_v66zdI8csLIKYIGEtCQ2FpXWlohHWtaDohSVaZuqpa1qnFQKWFAaZtYCdoZq13dKmmkhCW5-_vtEPFw9t3J-e-D5WCsUfADY2ZQtg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ISCA45697.2020.00023 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1728146615 9781728146614 |
| EndPage | 158 |
| ExternalDocumentID | 9138986 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIO |
| ID | FETCH-LOGICAL-a285t-3d3437349c7b75925a0df13ee56b8cdbf0f6da3b628367949537a897acf658553 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 62 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000617734800012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 06 17:54:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a285t-3d3437349c7b75925a0df13ee56b8cdbf0f6da3b628367949537a897acf658553 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_9138986 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-May |
| PublicationDateYYYYMMDD | 2020-05-01 |
| PublicationDate_xml | – month: 05 year: 2020 text: 2020-May |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) |
| PublicationTitleAbbrev | ISCA |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib041538211 |
| Score | 2.4524536 |
| Snippet | In this paper, we introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 145 |
| SubjectTerms | Computer architecture Data models Deep learning Hardware Microarchitecture Parallel processing System-on-chip Tensors Transistors Vectors |
| Title | Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads |
| URI | https://ieeexplore.ieee.org/document/9138986 |
| WOSCitedRecordID | wos000617734800012&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1t8eBJpRW_ycGDgrG7m80m8VaqRUFKoRV7K9lkVkTdLe3W3-9ku1YPXryFEAhMMpl5mXkzhJwbhXgrySSTXHAWmzBiWkjDDDeRwdcwsqoiCj_K4VBNp3rUIFcbLgwAVMlncO2HVSzfFXblv8q62kfVVNIkTSmTNVfr--7EXnMRzNTsuDDQ3Ydxv4fugZaIAiOfwBX4nkS_eqhUJmSw87_Nd0nnh4tHRxsrs0cakLfJs2-3-UYHZlne0B6dIBYtFtRHmM0HrqJ1-j_OXUzGo0uKnintWYsmxh84LrgFmNO6tuoL9T_m74Vxyw55GtxN-vesbpHATKREybjjvjZRrK1MpdCRMIHLQg4gklRZl2ZBljjD0wS9iARVTwsujdLS2AxdDyH4PmnlRQ4HhFqETvgyQho4GesQdKghtIDwTGax5fEhaXuhzObrKhizWh5Hf08fk20v9XVq4AlplYsVnJIt-1m-Lhdn1dF9ASu6mJ8 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwED7mFPRJZRN_mwcfFKxrm6ZpfBvTseEcg03c20jTq4i6jq3z7_dS6_TBF99KCARyTe6-3H33AZzriPBWmEpHcsGdQHu-o4TUjuba13Qb-iYqiMI92e9H47EaVOBqxYVBxKL4DK_tZ5HLTzKztE9lDWWzalG4ButWOatka33_PYE9uwRnSn6c56pGd9hqUoCgJOFA35ZwuVaV6JeKSuFE2tv_W34H6j9sPDZY-ZldqOC0Bk9WcPOVtfUiv2FNNiI0ms2ZzTHrd5rFSgIAjV2MhoNLRrEpaxpDTsaanCbcIs5Y2V31mdk387dMJ4s6PLbvRq2OU4okONqPRO7whNvuRIEyMpZC-UK7SepxRBHGkUni1E3DRPM4pDgipMOnBJc6UlKblIIPIfgeVKfZFPeBGQJPdDdi7CYyUB4qT6FnkACaTAPDgwOo2U2ZzL76YEzK_Tj8e_gMNjujh96k1-3fH8GWtcBXoeAxVPP5Ek9gw3zkL4v5aWHGT1GWm-g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+ACM%2FIEEE+47th+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=Think+Fast%3A+A+Tensor+Streaming+Processor+%28TSP%29+for+Accelerating+Deep+Learning+Workloads&rft.au=Abts%2C+Dennis&rft.au=Ross%2C+Jonathan&rft.au=Sparling%2C+Jonathan&rft.au=Wong-VanHaren%2C+Mark&rft.date=2020-05-01&rft.pub=IEEE&rft.spage=145&rft.epage=158&rft_id=info:doi/10.1109%2FISCA45697.2020.00023&rft.externalDocID=9138986 |