PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictiv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - International Symposium on High-Performance Computer Architecture S. 220 - 233
Hauptverfasser: Choi, Yujeong, Rhu, Minsoo
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.02.2020
Schlagworte:
ISSN:2378-203X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
AbstractList To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
Author Choi, Yujeong
Rhu, Minsoo
Author_xml – sequence: 1
  givenname: Yujeong
  surname: Choi
  fullname: Choi, Yujeong
  organization: Korea Advanced Institute of Science and Technology
– sequence: 2
  givenname: Minsoo
  surname: Rhu
  fullname: Rhu, Minsoo
  organization: Korea Advanced Institute of Science and Technology
BookMark eNotzt9KwzAYBfAoCq7TJ9CLvEDnl39N410ZmxM2LbqB4MVI2y9btGtH0wm-vRt6deDw43AictG0DRJyx2DEGJj7WT7OpFbSjDhwGAEA12ckYpqnLGHSqHMy4EKnMQfxfkWiED5Pxig2IB_562SRPdCM5h1Wvuz9N9LFoe59vLThi76VW6wOtW82NKs3bef77Y5O2-7EcbfvfVEjfcZDZ-tj1ZYYwsmuGt-Ha3LpbB3w5j-HZDWdLMezeP7y-DTO5rE_HupjC0KjtC5RiTPCFsJZKyUWUEmXCs4SyRhPkXGprba2LCtUpdQmTZx2tgAxJLd_ux4R1_vO72z3szaQKGVA_ALNLFS6
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPCA47549.2020.00027
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728161495
9781728161495
EISSN 2378-203X
EndPage 233
ExternalDocumentID 9065590
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
IEDL.DBID RIE
ISICitedReferencesCount 110
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:41:43 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
PageCount 14
ParticipantIDs ieee_primary_9065590
PublicationCentury 2000
PublicationDate 2020-Feb.
PublicationDateYYYYMMDD 2020-02-01
PublicationDate_xml – month: 02
  year: 2020
  text: 2020-Feb.
PublicationDecade 2020
PublicationTitle Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev HPCA
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002951
Score 2.510809
Snippet To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources...
SourceID ieee
SourceType Publisher
StartPage 220
SubjectTerms Computer architecture
Google
Graphics processing units
Random access memory
Servers
Task analysis
Throughput
Title PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
URI https://ieeexplore.ieee.org/document/9065590
WOSCitedRecordID wos000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1q8eCpait-swePrs3HJtv1FoqlBy09VCh4KJvNrC3WRtLU3-9MGiuCF28hDARmybw3s_NmGLuxAdX2fStsmlK1ytdCSy2Fi6inMPOzzFbT9R_VaNSbTvW4wW53WhgAqJrP4I4eq7v8LLcbKpV1NeJlpDFB31Mq3mq1dlE3QKpQS-N8T3eH434iFSY_mAIG1L3lBb8XqFT4MWj978uHrPMjxOPjHcQcsQasjlnrexMDr3_MNntBRz4l9zxBa7p5oRjGK22tmJj1GxrOEVJIec6T5WteLMr5Ox_kBZnDO0aNdAmc5nSYJa-lA2RLhHTdYc-Dh0l_KOq1CWIReGEpjBcqkMYhU3M6NGnojJESUi-TjvYSxRI5Vg8Q2JVRxlibQWSl0r3YKWdSLzxhzVW-glPGjYXIIIrHOkSeQtPhjE_pfqyQVSgJZ6xNvpp9bCdjzGo3nf_9-oId0GFse54vWbMsNnDF9u1nuVgX19VxfgGyIqAu
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KFfRUtRXf7sGjsXlssl1voVgqtqWHCgUPZbPZ2NJHJE39_c6ksSJ48RbCQGCWzPfN7HwzAHfapdq-oy0dRVStcqQlueRW4lNPYezEsS6m6_fEYNAaj-WwAvc7LYwxpmg-Mw_0WNzlx6neUKmsKREvfYkJ-p7PuWtv1Vq7uOsiWSjFcY4tm91hO-QC0x9MAl3q37Ld3ytUCgTp1P737SNo_Ejx2HAHMsdQMasTqH3vYmDlr1mHN3RlP3xkIVrT3QtFMVaoa62RWs_RcIqgQtpzFi7e02yWT5esk2ZkbpYYN6KFYTSpQy1YKR4gW6Kk6wa8dp5G7a5VLk6wZq7t5ZayPWG4SpCrJdJTkZcoxbmJ7JgntJko4MiyWgahXSihlNax8TUXshUkIlGR7Z1CdZWuzBkwpY2vEMcD6SFToflwyqGEPxDIKwQ351AnX00-trMxJqWbLv5-fQsH3VG_N-k9D14u4ZAOZtsBfQXVPNuYa9jXn_lsnd0UR_sFNuijdQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=PREMA%3A+A+Predictive+Multi-Task+Scheduling+Algorithm+For+Preemptible+Neural+Processing+Units&rft.au=Choi%2C+Yujeong&rft.au=Rhu%2C+Minsoo&rft.date=2020-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=220&rft.epage=233&rft_id=info:doi/10.1109%2FHPCA47549.2020.00027&rft.externalDocID=9065590