nvshare: Practical GPU Sharing Without Memory Size Constraints

GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization,...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) s. 16 - 20
Hlavní autoři:	Alexopoulos, Georgios, Mitropoulos, Dimitris
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 14.04.2024
Témata:	Containers Graphics Processing Unit Graphics processing units Machine learning Memory management Random access memory Reliability Resource Sharing Schedules
ISSN:	2574-1934
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
AbstractList	GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY
Author	Mitropoulos, Dimitris Alexopoulos, Georgios
Author_xml	– sequence: 1 givenname: Georgios surname: Alexopoulos fullname: Alexopoulos, Georgios email: grgalex@ba.uoa.gr organization: University of Athens and National Infrastructures for Research and Technology,Greece – sequence: 2 givenname: Dimitris surname: Mitropoulos fullname: Mitropoulos, Dimitris email: dimitro@ba.uoa.gr organization: University of Athens and National Infrastructures for Research and Technology,Greece
BookMark	eNotjMtKAzEARaMoWOus3bjID0zN--FCkEGrULFQi8uS19hIm5EkCvXrHdDNvZfD4Z6DkzSkAMAlRjOMGb-mgmom1YwKhhBlR6DRUqtxS8QRwcdgQrhkLdaUnYGmlA80amQ0tZyA2_RdtiaHG7jMxtXozA7Ol2u4GmFM7_At1u3wVeFz2A_5AFfxJ8BuSKVmE1MtF-C0N7sSmv-egvXD_Wv32C5e5k_d3aI1FMvaKmqZQoo6JBTrvSXWWGWt5wzbnkgqueqld9oT5_sxPefCMY2Fw5oFbegUXP39xhDC5jPHvcmHDUacMy0o_QULVEqm
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK ESBDL RIE RIL
DOI	10.1145/3639478.3640034
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798400705021
EISSN	2574-1934
EndPage	20
ExternalDocumentID	10554963
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO ESBDL IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
IEDL.DBID	RIE
ISICitedReferencesCount	1
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Thu May 08 06:04:16 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3
OpenAccessLink	https://ieeexplore.ieee.org/document/10554963
PageCount	5
ParticipantIDs	ieee_primary_10554963
PublicationCentury	2000
PublicationDate	2024-April-14
PublicationDateYYYYMMDD	2024-04-14
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-14 day: 14
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online)
PublicationTitleAbbrev	ICSE-COMPANION
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssj0003203497 ssib055574197
Score	2.2635887
Snippet	GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as...
SourceID	ieee
SourceType	Publisher
StartPage	16
SubjectTerms	Containers Graphics Processing Unit Graphics processing units Machine learning Memory management Random access memory Reliability Resource Sharing Schedules
Title	nvshare: Practical GPU Sharing Without Memory Size Constraints
URI	https://ieeexplore.ieee.org/document/10554963
WOSCitedRecordID	wos001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV25TgMxELVIREEFiCBuuaDdJGuP1zYFDeJoiCJBRLrIx1hJs0HJJhJ8PfZmA6KgoLEsy4XvN_Z43iPkOrdMmWhJZypHyMBqlZkCMGMavFOmr5n3tdiEHAzUeKyHTbB6HQuDiPXnM-ymbO3L93O3Sk9lvSTmCHHFtEhLymITrLVdPEKICI6NSywdw5wl6hXZ0PnkIHo8ojFI1eUFJFqWX3oqNZw87P-zIQek8xOYR4ffkHNIdrA8Irflejk1C7yhG_KhOOr0cTiiiYs51qJvs2o6X1X0OX2q_aAvs0-kSaezVoeolh0yerh_vXvKGlmEzESwrzLFLahoObl-oSB4y6yxylovILchWiPxBhCkd9oz50NMvRCFg2gGulwDasOPSbucl3hCaLAMeeKDkQVCcE4DcBQ6oHRxr_bFKemkzk_eN8wXk22_z_4oPyd7LIJ-8rbkcEHa1WKFl2TXravZcnFVz9cXwzCUwg
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGG0UTfSkRoy_7cHrYOu-bq0HL0bECIREiNzI2n4NXIaBQaJ_ve0YGg8evCzN0kO7_niv-_q9R8htpJjIHJMORIQQgJIiyBLAgEkwWmShZMaUZhNprydGI9mvktXLXBhELC-fYcMXy1i-meml_1XW9GaO4GbMNtnhACxcp2ttpg_n3MFjFRTzG3HMvPhKWgn6RMCbscNjSEUjTsALs_xyVCkBpXXwz6YckvpPah7tf4POEdnC_Jjc56vFJJvjHV3LD7nvTp_6Q-rVmF0t-jYtJrNlQbv-Wu0HfZ1-IvVOnaU_RLGok2HrcfDQDipjhCBzcF8EIlYgHHfSYSLAGsVUpoRShkOkrOMj7gxgU6OlYdpY9zScJxocEdSRBJRZfEJq-SzHU0KtYhh7RZg0QbBaS4AYubSYardaQ35G6r7z4_e19sV40-_zP97fkL32oNsZd557LxdknzkK4GMvEVySWjFf4hXZ1atiuphfl2P3BbPgmAk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=nvshare%3A+Practical+GPU+Sharing+Without+Memory+Size+Constraints&rft.au=Alexopoulos%2C+Georgios&rft.au=Mitropoulos%2C+Dimitris&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2574-1934&rft.spage=16&rft.epage=20&rft_id=info:doi/10.1145%2F3639478.3640034&rft.externalDocID=10554963