nvshare: Practical GPU Sharing Without Memory Size Constraints

GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) S. 16 - 20
Hauptverfasser: Alexopoulos, Georgios, Mitropoulos, Dimitris
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 14.04.2024
Schlagworte:
ISSN:2574-1934
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
AbstractList GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
Author Mitropoulos, Dimitris
Alexopoulos, Georgios
Author_xml – sequence: 1
  givenname: Georgios
  surname: Alexopoulos
  fullname: Alexopoulos, Georgios
  email: grgalex@ba.uoa.gr
  organization: University of Athens and National Infrastructures for Research and Technology,Greece
– sequence: 2
  givenname: Dimitris
  surname: Mitropoulos
  fullname: Mitropoulos, Dimitris
  email: dimitro@ba.uoa.gr
  organization: University of Athens and National Infrastructures for Research and Technology,Greece
BookMark eNotjMtKAzEARaMoWOus3bjID0zN--FCkEGrULFQi8uS19hIm5EkCvXrHdDNvZfD4Z6DkzSkAMAlRjOMGb-mgmom1YwKhhBlR6DRUqtxS8QRwcdgQrhkLdaUnYGmlA80amQ0tZyA2_RdtiaHG7jMxtXozA7Ol2u4GmFM7_At1u3wVeFz2A_5AFfxJ8BuSKVmE1MtF-C0N7sSmv-egvXD_Wv32C5e5k_d3aI1FMvaKmqZQoo6JBTrvSXWWGWt5wzbnkgqueqld9oT5_sxPefCMY2Fw5oFbegUXP39xhDC5jPHvcmHDUacMy0o_QULVEqm
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
ESBDL
RIE
RIL
DOI 10.1145/3639478.3640034
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access (Activated by CARLI)
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705021
EISSN 2574-1934
EndPage 20
ExternalDocumentID 10554963
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
ESBDL
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu May 08 06:04:16 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
OpenAccessLink https://ieeexplore.ieee.org/document/10554963
PageCount 5
ParticipantIDs ieee_primary_10554963
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online)
PublicationTitleAbbrev ICSE-COMPANION
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203497
ssib055574197
Score 2.2636874
Snippet GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as...
SourceID ieee
SourceType Publisher
StartPage 16
SubjectTerms Containers
Graphics Processing Unit
Graphics processing units
Machine learning
Memory management
Random access memory
Reliability
Resource Sharing
Schedules
Title nvshare: Practical GPU Sharing Without Memory Size Constraints
URI https://ieeexplore.ieee.org/document/10554963
WOSCitedRecordID wos001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwFLRoxcAEiCK-5YE1beLYscPAgigsVJWgolsV289qlxS1aaXy63kvTUEMDCyRFUVW_KHcKc93x9itgyKOA7l8gpdUZnTkAZlEIILxRFDjQtZhE3owMONxPmzE6rUWBgDqw2fQpWZdy_dzt6JfZT0Kc5S4Y1qspXW2FWvtNo9SCsGxKYnRZzgVZL2iGzufRKpeimgstemmmSRbll95KjWc9A__-SJHrPMjzOPDb8g5ZntQnrD7cr2cFgu441vzIZx1_jQccfJixqf4-6yazlcVf6FDtRv-OvsETjmddTpEteywUf_x7eE5amIRogLBvopMaqVB5uTizMjgrbCFNdZ6JRMbkI3gBAftXe6F8wGvXqnMSaSBLskl5EV6ytrlvIQzxo3xpEYVsc6xK4dk0UEcRBHnJoDNk3PWocFPPrbOF5PduC_-uH_JDgSCPlVbEnnF2tViBdds362r2XJxU6_XF642lJs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwGA06BT2pOPG3OXjtlqbJmnrwIs6J2xi44W6jSb6wXjrZuoH-9ebrOsWDBy8llBKaH_Q9-uW9R8itgZQxhy6fYAWWGQ16QIYBcKcsElSWijJsIu731XicDCqxeqmFAYDy8Bk0sFnW8u3MLPFXWRPDHIXfMdtkRwrB2Vqutdk-UkoPj1VRDD_EEUfzlbgy9AmFbEYej0WsGlFLoDHLr0SVElDaB_98lUNS_5Hm0cE36ByRLciPyX2-WkzTOdzRtf2Qn3f6NBhRdGP2T9G3rJjOlgXt4bHaD_qafQLFpM4yH6JY1Mmo_Th86ARVMEKQergvAhVpoTx3MqylhLOa61Qrra0UoXaej_gpdrE1ieXGOn-1UraM8ETQhImAJI1OSC2f5XBKqFIW9aicxYnvyni6aIA5nrJEOdBJeEbqOPjJ-9r7YrIZ9_kf92_IXmfY6066z_2XC7LPPQXA2ksoLkmtmC_hiuyaVZEt5tfl2n0BnQqX4g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=nvshare%3A+Practical+GPU+Sharing+Without+Memory+Size+Constraints&rft.au=Alexopoulos%2C+Georgios&rft.au=Mitropoulos%2C+Dimitris&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2574-1934&rft.spage=16&rft.epage=20&rft_id=info:doi/10.1145%2F3639478.3640034&rft.externalDocID=10554963