PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units
To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictiv...
Gespeichert in:
| Veröffentlicht in: | Proceedings - International Symposium on High-Performance Computer Architecture S. 220 - 233 |
|---|---|
| Hauptverfasser: | , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.02.2020
|
| Schlagworte: | |
| ISSN: | 2378-203X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively. |
|---|---|
| AbstractList | To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize them to meet scheduling objectives. We show that preemptive NPU multi-tasking can achieve an average 7.8×, 1.4×, and 4.8× improvement in latency, throughput, and SLA satisfaction, respectively. |
| Author | Choi, Yujeong Rhu, Minsoo |
| Author_xml | – sequence: 1 givenname: Yujeong surname: Choi fullname: Choi, Yujeong organization: Korea Advanced Institute of Science and Technology – sequence: 2 givenname: Minsoo surname: Rhu fullname: Rhu, Minsoo organization: Korea Advanced Institute of Science and Technology |
| BookMark | eNotzt9KwzAYBfAoCq7TJ9CLvEDnl39N410ZmxM2LbqB4MVI2y9btGtH0wm-vRt6deDw43AictG0DRJyx2DEGJj7WT7OpFbSjDhwGAEA12ckYpqnLGHSqHMy4EKnMQfxfkWiED5Pxig2IB_562SRPdCM5h1Wvuz9N9LFoe59vLThi76VW6wOtW82NKs3bef77Y5O2-7EcbfvfVEjfcZDZ-tj1ZYYwsmuGt-Ha3LpbB3w5j-HZDWdLMezeP7y-DTO5rE_HupjC0KjtC5RiTPCFsJZKyUWUEmXCs4SyRhPkXGprba2LCtUpdQmTZx2tgAxJLd_ux4R1_vO72z3szaQKGVA_ALNLFS6 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/HPCA47549.2020.00027 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1728161495 9781728161495 |
| EISSN | 2378-203X |
| EndPage | 233 |
| ExternalDocumentID | 9065590 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 110 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:41:43 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-a037e4af656f93ab3faa44eb0d4f8321641128e1247a7aaccde5c47986f7fab03 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_9065590 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Feb. |
| PublicationDateYYYYMMDD | 2020-02-01 |
| PublicationDate_xml | – month: 02 year: 2020 text: 2020-Feb. |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Symposium on High-Performance Computer Architecture |
| PublicationTitleAbbrev | HPCA |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0002951 |
| Score | 2.510809 |
| Snippet | To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 220 |
| SubjectTerms | Computer architecture Graphics processing units Random access memory Servers Task analysis Throughput |
| Title | PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units |
| URI | https://ieeexplore.ieee.org/document/9065590 |
| WOSCitedRecordID | wos000531494100017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1q8eCpait-swePrs3HJtv1FoqlBy09VCh4KJvNrC3WRtLU3-9MGiuCF28hDARmybw3s_NmGLuxAdX2fStsmlK1ytdCSy2Fi6inMPOzzFbT9R_VaNSbTvW4wW53WhgAqJrP4I4eq7v8LLcbKpV1NeJlpDFB31Mq3mq1dlE3QKpQS-N8T3eH434iFSY_mAIG1L3lBb8XqFT4MWj978uHrPMjxOPjHcQcsQasjlnrexMDr3_MNntBRz4l9zxBa7p5oRjGK22tmJj1GxrOEVJIec6T5WteLMr5Ox_kBZnDO0aNdAmc5nSYJa-lA2RLhHTdYc-Dh0l_KOq1CWIReGEpjBcqkMYhU3M6NGnojJESUi-TjvYSxRI5Vg8Q2JVRxlibQWSl0r3YKWdSLzxhzVW-glPGjYXIIIrHOkSeQtPhjE_pfqyQVSgJZ6xNvpp9bCdjzGo3nf_9-oId0GFse54vWbMsNnDF9u1nuVgX19VxfgGyIqAu |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KFfRUtRXf7sGjsXlssl1voVgqtqWHCgUPZbPZ2NJHJE39_c6ksSJ48RbCQGCWzPfN7HwzAHfapdq-oy0dRVStcqQlueRW4lNPYezEsS6m6_fEYNAaj-WwAvc7LYwxpmg-Mw_0WNzlx6neUKmsKREvfYkJ-p7PuWtv1Vq7uOsiWSjFcY4tm91hO-QC0x9MAl3q37Ld3ytUCgTp1P737SNo_Ejx2HAHMsdQMasTqH3vYmDlr1mHN3RlP3xkIVrT3QtFMVaoa62RWs_RcIqgQtpzFi7e02yWT5esk2ZkbpYYN6KFYTSpQy1YKR4gW6Kk6wa8dp5G7a5VLk6wZq7t5ZayPWG4SpCrJdJTkZcoxbmJ7JgntJko4MiyWgahXSihlNax8TUXshUkIlGR7Z1CdZWuzBkwpY2vEMcD6SFToflwyqGEPxDIKwQ351AnX00-trMxJqWbLv5-fQsH3VG_N-k9D14u4ZAOZtsBfQXVPNuYa9jXn_lsnd0UR_sFNuijdQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=PREMA%3A+A+Predictive+Multi-Task+Scheduling+Algorithm+For+Preemptible+Neural+Processing+Units&rft.au=Choi%2C+Yujeong&rft.au=Rhu%2C+Minsoo&rft.date=2020-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=220&rft.epage=233&rft_id=info:doi/10.1109%2FHPCA47549.2020.00027&rft.externalDocID=9065590 |