nvshare: Practical GPU Sharing Without Memory Size Constraints
GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization,...
Saved in:
| Published in: | Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) pp. 16 - 20 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
ACM
14.04.2024
|
| Subjects: | |
| ISSN: | 2574-1934 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY |
|---|---|
| AbstractList | GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as Kubernetes. Kubernetes schedules GPU workloads by exclusively assigning a device to a single job, which leads to massive GPU underutilization, especially for interactive development jobs with significant idle periods. Current GPU sharing approaches assign a fraction of GPU memory to each co-located job to avoid memory contention and out-of-memory errors. However, this is impractical, as it requires a priori knowledge of memory usage and does not fully address GPU underutilization. We propose nvshare, which transparently enables page faults (i.e., exceptions that are raised when an entity attempts to access a resource) to allow virtual GPU memory oversubscription. In this way we permit each application to utilize the entire physical GPU memory (Video RAM). To prevent thrashing (a situation in which page faults dominate execution time) in a reliable manner, nvshare serializes overlapping GPU bursts from different applications. We compared nvshare with KubeShare, a state-of-the-art GPU sharing solution. Our results indicate that both perform equally well in conventional sharing cases where total GPU memory usage fits into VRAM. For memory oversubscription scenarios, which KubeShare does not support, nvshare outperforms the sequential execution baseline by up to 1.35x. A video of nvshare is available at https://www.youtube.com/watch?v=9n-5scSAICY |
| Author | Mitropoulos, Dimitris Alexopoulos, Georgios |
| Author_xml | – sequence: 1 givenname: Georgios surname: Alexopoulos fullname: Alexopoulos, Georgios email: grgalex@ba.uoa.gr organization: University of Athens and National Infrastructures for Research and Technology,Greece – sequence: 2 givenname: Dimitris surname: Mitropoulos fullname: Mitropoulos, Dimitris email: dimitro@ba.uoa.gr organization: University of Athens and National Infrastructures for Research and Technology,Greece |
| BookMark | eNotjMtKAzEARaMoWOus3bjID0zN--FCkEGrULFQi8uS19hIm5EkCvXrHdDNvZfD4Z6DkzSkAMAlRjOMGb-mgmom1YwKhhBlR6DRUqtxS8QRwcdgQrhkLdaUnYGmlA80amQ0tZyA2_RdtiaHG7jMxtXozA7Ol2u4GmFM7_At1u3wVeFz2A_5AFfxJ8BuSKVmE1MtF-C0N7sSmv-egvXD_Wv32C5e5k_d3aI1FMvaKmqZQoo6JBTrvSXWWGWt5wzbnkgqueqld9oT5_sxPefCMY2Fw5oFbegUXP39xhDC5jPHvcmHDUacMy0o_QULVEqm |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK ESBDL RIE RIL |
| DOI | 10.1145/3639478.3640034 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Open Access Journals IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798400705021 |
| EISSN | 2574-1934 |
| EndPage | 20 |
| ExternalDocumentID | 10554963 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO ESBDL IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu May 08 06:04:16 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a317t-83b48083c0684fdb2bab8bbd541bf273758f7dc9d2cdf9d2d556c4916c194e9a3 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10554963 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_10554963 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-April-14 |
| PublicationDateYYYYMMDD | 2024-04-14 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) |
| PublicationTitleAbbrev | ICSE-COMPANION |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003203497 ssib055574197 |
| Score | 2.2636874 |
| Snippet | GPUs are essential for accelerating Machine Learning (ML) work-loads. A common practice is deploying ML jobs as containers managed by an orchestrator such as... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 16 |
| SubjectTerms | Containers Graphics Processing Unit Graphics processing units Machine learning Memory management Random access memory Reliability Resource Sharing Schedules |
| Title | nvshare: Practical GPU Sharing Without Memory Size Constraints |
| URI | https://ieeexplore.ieee.org/document/10554963 |
| WOSCitedRecordID | wos001465567400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED3RioEJEEV8ywNr2iaxE4eBBVFYqCpBRbcqZztqlxS1SaXy67lzUxADA0sURbEUX6y8c-7eewC3ttCcV_cDpnkG0kQYYB9pz5MYwgvDXUPWm02kw6GeTLJRQ1b3XBjnnG8-c10-9bV8uzA1_yrrsZmjpBXTglaaJluy1m7xKKUIHJuSGH-G44ilV9JGzieUqhcTGstUd-NEsizLLz8VDyeDw38-yBF0foh5YvQNOcew58oTuC_Xq1m-dHdiKz5EURdPo7FgLWa6S7zPq9mirsQLN9VuxOv80wn26fTuENWqA-PB49vDc9DYIgQ5gX0V6BilpszJ9BMtC4sR5qgRrZIhFpSN0A6gSK3JbGRsQUerKPCS0kATZtJleXwK7XJRujMQGErEjIY7Q7FSeV4oNJgZGkSZBeI5dHjy04-t8sV0N--LP65fwkFEoM_VllBeQbta1u4a9s26mq-WN_59fQEDF5Xi |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cCaNh_nxmFgQZQi2qoSrehW5WxHzZKiNq0Ev56zm4IYGFiiKIql-GLlPefu3mPsVmfS8mrfs22eHqgQPfSR9jwtRXihbNWQdmYTcb8vx-NkUDWru14YY4wrPjMNe-py-XqmlvZXWdOaOQKtmG22IwBCf92utVk-QgiCxyopZj_EUWjFV-JK0CcA0YwIjyGWjagFVpjll6OKA5T2wT8f5ZDVf1rz-OAbdI7YlimO2X2xWkzTubnja_khijt_Goy4VWOmu_hbXk5ny5L3bFntB3_NPw23Tp3OH6Jc1Nmo_Th86HiVMYKXEtyXnowQJHEn5bckZBpDTFEiagEBZsRHaA-QxVolOlQ6o6MWFHogIqiCBEySRiesVswKc8o4BoCY0HCjKFYiTTOBChNFg4hbIJ6xup385H2tfTHZzPv8j-s3bK8z7HUn3ef-ywXbD4kC2NxLAJesVs6X5ortqlWZL-bX7t19AQETmSk |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=nvshare%3A+Practical+GPU+Sharing+Without+Memory+Size+Constraints&rft.au=Alexopoulos%2C+Georgios&rft.au=Mitropoulos%2C+Dimitris&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2574-1934&rft.spage=16&rft.epage=20&rft_id=info:doi/10.1145%2F3639478.3640034&rft.externalDocID=10554963 |