nvshare: Practical GPU Sharing Without Memory Size Constraints

GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization,...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) s. 16 - 20
Hlavní autoři: Alexopoulos, Georgios, Mitropoulos, Dimitris
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 14.04.2024
Témata:
ISSN:2574-1934
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
AbstractList GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
Author Mitropoulos, Dimitris
Alexopoulos, Georgios
Author_xml – sequence: 1
  givenname: Georgios
  surname: Alexopoulos
  fullname: Alexopoulos, Georgios
  email: grgalex@ba.uoa.gr
  organization: University of Athens and National Infrastructures for Research and Technology,Greece
– sequence: 2
  givenname: Dimitris
  surname: Mitropoulos
  fullname: Mitropoulos, Dimitris
  email: dimitro@ba.uoa.gr
  organization: University of Athens and National Infrastructures for Research and Technology,Greece
BookMark eNotjMtKAzEARaMoWOus3bjID0zN--FCkEGrULFQi8uS19hIm5EkCvXrHdDNvZfD4Z6DkzSkAMAlRjOMGb-mgmom1YwKhhBlR6DRUqtxS8QRwcdgQrhkLdaUnYGmlA80amQ0tZyA2_RdtiaHG7jMxtXozA7Ol2u4GmFM7_At1u3wVeFz2A_5AFfxJ8BuSKVmE1MtF-C0N7sSmv-egvXD_Wv32C5e5k_d3aI1FMvaKmqZQoo6JBTrvSXWWGWt5wzbnkgqueqld9oT5_sxPefCMY2Fw5oFbegUXP39xhDC5jPHvcmHDUacMy0o_QULVEqm
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
ESBDL
RIE
RIL
DOI 10.1145/3639478.3640034
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705021
EISSN 2574-1934
EndPage 20
ExternalDocumentID 10554963
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
ESBDL
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Thu May 08 06:04:16 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
OpenAccessLink https://ieeexplore.ieee.org/document/10554963
PageCount 5
ParticipantIDs ieee_primary_10554963
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online)
PublicationTitleAbbrev ICSE-COMPANION
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203497
ssib055574197
Score 2.2635887
Snippet GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as...
SourceID ieee
SourceType Publisher
StartPage 16
SubjectTerms Containers
Graphics Processing Unit
Graphics processing units
Machine learning
Memory management
Random access memory
Reliability
Resource Sharing
Schedules
Title nvshare: Practical GPU Sharing Without Memory Size Constraints
URI https://ieeexplore.ieee.org/document/10554963
WOSCitedRecordID wos001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV25TgMxELVIREEFiCBuuaDdJGuP1zYFDeJoiCJBRLrIx1hJs0HJJhJ8PfZmA6KgoLEsy4XvN_Z43iPkOrdMmWhJZypHyMBqlZkCMGMavFOmr5n3tdiEHAzUeKyHTbB6HQuDiPXnM-ymbO3L93O3Sk9lvSTmCHHFtEhLymITrLVdPEKICI6NSywdw5wl6hXZ0PnkIHo8ojFI1eUFJFqWX3oqNZw87P-zIQek8xOYR4ffkHNIdrA8Irflejk1C7yhG_KhOOr0cTiiiYs51qJvs2o6X1X0OX2q_aAvs0-kSaezVoeolh0yerh_vXvKGlmEzESwrzLFLahoObl-oSB4y6yxylovILchWiPxBhCkd9oz50NMvRCFg2gGulwDasOPSbucl3hCaLAMeeKDkQVCcE4DcBQ6oHRxr_bFKemkzk_eN8wXk22_z_4oPyd7LIJ-8rbkcEHa1WKFl2TXravZcnFVz9cXwzCUwg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGG0UTfSkRoy_7cHrYOu-bq0HL0bECIREiNzI2n4NXIaBQaJ_ve0YGg8evCzN0kO7_niv-_q9R8htpJjIHJMORIQQgJIiyBLAgEkwWmShZMaUZhNprydGI9mvktXLXBhELC-fYcMXy1i-meml_1XW9GaO4GbMNtnhACxcp2ttpg_n3MFjFRTzG3HMvPhKWgn6RMCbscNjSEUjTsALs_xyVCkBpXXwz6YckvpPah7tf4POEdnC_Jjc56vFJJvjHV3LD7nvTp_6Q-rVmF0t-jYtJrNlQbv-Wu0HfZ1-IvVOnaU_RLGok2HrcfDQDipjhCBzcF8EIlYgHHfSYSLAGsVUpoRShkOkrOMj7gxgU6OlYdpY9zScJxocEdSRBJRZfEJq-SzHU0KtYhh7RZg0QbBaS4AYubSYardaQ35G6r7z4_e19sV40-_zP97fkL32oNsZd557LxdknzkK4GMvEVySWjFf4hXZ1atiuphfl2P3BbPgmAk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=nvshare%3A+Practical+GPU+Sharing+Without+Memory+Size+Constraints&rft.au=Alexopoulos%2C+Georgios&rft.au=Mitropoulos%2C+Dimitris&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2574-1934&rft.spage=16&rft.epage=20&rft_id=info:doi/10.1145%2F3639478.3640034&rft.externalDocID=10554963