PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictiv...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - International Symposium on High-Performance Computer Architecture pp. 220 - 233
Main Authors: Choi, Yujeong, Rhu, Minsoo
Format: Conference Proceeding
Language:English
Published: IEEE 01.02.2020
Subjects:
ISSN:2378-203X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
AbstractList To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively.
Author Choi, Yujeong
Rhu, Minsoo
Author_xml – sequence: 1
  givenname: Yujeong
  surname: Choi
  fullname: Choi, Yujeong
  organization: Korea Advanced Institute of Science and Technology
– sequence: 2
  givenname: Minsoo
  surname: Rhu
  fullname: Rhu, Minsoo
  organization: Korea Advanced Institute of Science and Technology
BookMark eNotzt9KwzAYBfAoCq7TJ9CLvEDnl39N410ZmxM2LbqB4MVI2y9btGtH0wm-vRt6deDw43AictG0DRJyx2DEGJj7WT7OpFbSjDhwGAEA12ckYpqnLGHSqHMy4EKnMQfxfkWiED5Pxig2IB_562SRPdCM5h1Wvuz9N9LFoe59vLThi76VW6wOtW82NKs3bef77Y5O2-7EcbfvfVEjfcZDZ-tj1ZYYwsmuGt-Ha3LpbB3w5j-HZDWdLMezeP7y-DTO5rE_HupjC0KjtC5RiTPCFsJZKyUWUEmXCs4SyRhPkXGprba2LCtUpdQmTZx2tgAxJLd_ux4R1_vO72z3szaQKGVA_ALNLFS6
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPCA47549.2020.00027
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728161495
9781728161495
EISSN 2378-203X
EndPage 233
ExternalDocumentID 9065590
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
IEDL.DBID RIE
ISICitedReferencesCount 110
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:41:43 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03
PageCount 14
ParticipantIDs ieee_primary_9065590
PublicationCentury 2000
PublicationDate 2020-Feb.
PublicationDateYYYYMMDD 2020-02-01
PublicationDate_xml – month: 02
  year: 2020
  text: 2020-Feb.
PublicationDecade 2020
PublicationTitle Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev HPCA
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002951
Score 2.510809
Snippet To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources...
SourceID ieee
SourceType Publisher
StartPage 220
SubjectTerms Computer architecture
Google
Graphics processing units
Random access memory
Servers
Task analysis
Throughput
Title PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
URI https://ieeexplore.ieee.org/document/9065590
WOSCitedRecordID wos000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2QePCECsbv9ODRlXW3S7feNkTCQQkHNCQeSLudChFYsiz-fmeWFWPixVvTTDrJNJ03befNMHZjnFIyjeh8R9ITiEmeigR4OG-NsogqZfL465McDOLxWA1r7HbHhQGAMvkM7mhY_uXbLN3QU1lbIV5GCi_oe1J2tlytndcNMFSoqHH3vmr3h90ENQviogSUveUHvxuolPjRa_xP8yFr_RDx-HAHMUesBstj1vjuxMCrg9lkb2jI5-SBJyhNPy_kw3jJrfVGev2BglOEFGKe82T-nuWzYrrgvSwncVig1zBz4FSnQ895RR0gWQpI1y320nscdfte1TbBmwV-WHjaDyUI7TBScyrUJnRaCwHGt8JRX6KOwBgrBgR2qaXWaWohSoVUccdJp40fnrD6MlvCKeOxcjIwRgkN6FGFVZGU2rcaENZxzfSMNclWk9W2MsakMtP539MX7IA2Y5vzfMnqRb6BK7affhazdX5dbucXuO6hMQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG2ImugJFYzf9uDRlbrb0q23DZFgBMIBDYkH0u5OhQisWRZ_v9NlxZh48dY0k04yTedN23kzhFwbq5SMhTvfQnocMclTgoOH84lRCaJKkTz-0pX9fjgaqUGF3Gy4MABQJJ_BrRsWf_lJGq_cU1lDIV4KhRf0bcG5z9ZsrY3f9TFYKMlxd0w1OoNWhLq5Y6P4Ln-L-b9bqBQI0q7-T_c-qf9Q8ehgAzIHpAKLQ1L97sVAy6NZI69oyl50TyOUdn8vzovRgl3rDfXyHQUnCCqOe06j2VuaTfPJnLbTzInDHP2GmQF1lTr0jJbkASfrQtJlnTy3H4atjlc2TvCmPgtyT7NAAtcWYzWrAm0CqzXnYFjCretM1OQYZYWA0C611DqOExAxlypsWmm1YcER2VqkCzgmNFRW-sYorgF9Kk-UkFKzRAMCO64Zn5Cas9X4Y10bY1ya6fTv6Suy2xn2uuPuY__pjOy5jVlnQJ-TrTxbwQXZiT_z6TK7LLb2C0l2pHg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=PREMA%3A+A+Predictive+Multi-Task+Scheduling+Algorithm+For+Preemptible+Neural+Processing+Units&rft.au=Choi%2C+Yujeong&rft.au=Rhu%2C+Minsoo&rft.date=2020-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=220&rft.epage=233&rft_id=info:doi/10.1109%2FHPCA47549.2020.00027&rft.externalDocID=9065590