PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictiv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings - International Symposium on High-Performance Computer Architecture S. 220 - 233
Hauptverfasser:	Choi, Yujeong, Rhu, Minsoo
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 01.02.2020
Schlagworte:	Computer architecture Google Graphics processing units Random access memory Servers Task analysis Throughput
ISSN:	2378-203X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
AbstractList	To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
Author	Choi, Yujeong Rhu, Minsoo
Author_xml	– sequence: 1 givenname: Yujeong surname: Choi fullname: Choi, Yujeong organization: Korea Advanced Institute of Science and Technology – sequence: 2 givenname: Minsoo surname: Rhu fullname: Rhu, Minsoo organization: Korea Advanced Institute of Science and Technology
BookMark	eNotzt9KwzAYBfAoCq7TJ9CLvEDnl39N410ZmxM2LbqB4MVI2y9btGtH0wm-vRt6deDw43AictG0DRJyx2DEGJj7WT7OpFbSjDhwGAEA12ckYpqnLGHSqHMy4EKnMQfxfkWiED5Pxig2IB_562SRPdCM5h1Wvuz9N9LFoe59vLThi76VW6wOtW82NKs3bef77Y5O2-7EcbfvfVEjfcZDZ-tj1ZYYwsmuGt-Ha3LpbB3w5j-HZDWdLMezeP7y-DTO5rE_HupjC0KjtC5RiTPCFsJZKyUWUEmXCs4SyRhPkXGprba2LCtUpdQmTZx2tgAxJLd_ux4R1_vO72z3szaQKGVA_ALNLFS6
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/HPCA47549.2020.00027
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1728161495 9781728161495
EISSN	2378-203X
EndPage	233
ExternalDocumentID	9065590
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
IEDL.DBID	RIE
ISICitedReferencesCount	110
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:41:43 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
PageCount	14
ParticipantIDs	ieee_primary_9065590
PublicationCentury	2000
PublicationDate	2020-Feb.
PublicationDateYYYYMMDD	2020-02-01
PublicationDate_xml	– month: 02 year: 2020 text: 2020-Feb.
PublicationDecade	2020
PublicationTitle	Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev	HPCA
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002951
Score	2.510809
Snippet	To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources...
SourceID	ieee
SourceType	Publisher
StartPage	220
SubjectTerms	Computer architecture Google Graphics processing units Random access memory Servers Task analysis Throughput
Title	PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
URI	https://ieeexplore.ieee.org/document/9065590
WOSCitedRecordID	wos000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1q8eCpait-swePrs3HJtv1FoqlBy09VCh4KJvNrC3WRtLU3-9MGiuCF28hDARmybw3s_NmGLuxAdX2fStsmlK1ytdCSy2Fi6inMPOzzFbT9R_VaNSbTvW4wW53WhgAqJrP4I4eq7v8LLcbKpV1NeJlpDFB31Mq3mq1dlE3QKpQS-N8T3eH434iFSY_mAIG1L3lBb8XqFT4MWj978uHrPMjxOPjHcQcsQasjlnrexMDr3_MNntBRz4l9zxBa7p5oRjGK22tmJj1GxrOEVJIec6T5WteLMr5Ox_kBZnDO0aNdAmc5nSYJa-lA2RLhHTdYc-Dh0l_KOq1CWIReGEpjBcqkMYhU3M6NGnojJESUi-TjvYSxRI5Vg8Q2JVRxlibQWSl0r3YKWdSLzxhzVW-glPGjYXIIIrHOkSeQtPhjE_pfqyQVSgJZ6xNvpp9bCdjzGo3nf_9-oId0GFse54vWbMsNnDF9u1nuVgX19VxfgGyIqAu
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KFfRUtRXf7sGjsXlssl1voVgqtqWHCgUPZbPZ2NJHJE39_c6ksSJ48RbCQGCWzPfN7HwzAHfapdq-oy0dRVStcqQlueRW4lNPYezEsS6m6_fEYNAaj-WwAvc7LYwxpmg-Mw_0WNzlx6neUKmsKREvfYkJ-p7PuWtv1Vq7uOsiWSjFcY4tm91hO-QC0x9MAl3q37Ld3ytUCgTp1P737SNo_Ejx2HAHMsdQMasTqH3vYmDlr1mHN3RlP3xkIVrT3QtFMVaoa62RWs_RcIqgQtpzFi7e02yWT5esk2ZkbpYYN6KFYTSpQy1YKR4gW6Kk6wa8dp5G7a5VLk6wZq7t5ZayPWG4SpCrJdJTkZcoxbmJ7JgntJko4MiyWgahXSihlNax8TUXshUkIlGR7Z1CdZWuzBkwpY2vEMcD6SFToflwyqGEPxDIKwQ351AnX00-trMxJqWbLv5-fQsH3VG_N-k9D14u4ZAOZtsBfQXVPNuYa9jXn_lsnd0UR_sFNuijdQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=PREMA%3A+A+Predictive+Multi-Task+Scheduling+Algorithm+For+Preemptible+Neural+Processing+Units&rft.au=Choi%2C+Yujeong&rft.au=Rhu%2C+Minsoo&rft.date=2020-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=220&rft.epage=233&rft_id=info:doi/10.1109%2FHPCA47549.2020.00027&rft.externalDocID=9065590