Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness
Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of p...
Uloženo v:
| Vydáno v: | ACM transactions on parallel computing Ročník 12; číslo 1 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
11.02.2025
|
| Témata: | |
| ISSN: | 2329-4957, 2329-4957 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains. |
|---|---|
| AbstractList | Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains. Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains. |
| Author | Xu, Zichen Li, Hao Pham, Minh Meng, Jinghan Yuan, Yongke Tu, Yicheng Mou, Chengcheng |
| Author_xml | – sequence: 1 givenname: Minh surname: Pham fullname: Pham, Minh organization: University of South Florida, USA – sequence: 2 givenname: Yongke surname: Yuan fullname: Yuan, Yongke organization: Beijing University of Technology, China – sequence: 3 givenname: Hao surname: Li fullname: Li, Hao organization: University of South Florida, USA – sequence: 4 givenname: Chengcheng surname: Mou fullname: Mou, Chengcheng organization: University of South Florida, USA – sequence: 5 givenname: Yicheng surname: Tu fullname: Tu, Yicheng organization: University of South Florida, USA – sequence: 6 givenname: Zichen surname: Xu fullname: Xu, Zichen organization: Nanchang University, China – sequence: 7 givenname: Jinghan surname: Meng fullname: Meng, Jinghan organization: University of South Florida, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39990623$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkE1Lw0AYhBep2FqL_0D26CW637vxpvUTKhat5_AmeaORZFOziZJ_b8AKnmYGnpnDHJKJbzwScszZGedKn0vLuBFyj8yEFHGkYm0n__yULEL4YIxxoa1x8QGZyjiO2ViZkfX14KEuM3rVFwW29BE8vGGNvqOlH1MI5RdWA11DC1WFFX0ZQod1uKCbd6Tr5nvsNAV9Bp83tccQjsh-AVXAxU7n5PX2ZrO8j1ZPdw_Ly1UEUvMuwlxBjAVKMFogpqniqWWWmcIZa7WAFKV2mhvlwKjMgFXGFZhnqUqZtE7Myenv7rZtPnsMXVKXIcOqAo9NHxLJLRNWO2NG9GSH9mmNebJtyxraIfl7QfwASide6g |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1145/3701623 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2329-4957 |
| ExternalDocumentID | 39990623 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM140316 |
| GroupedDBID | 4.4 5VS AAKMM AALFJ AAYFX ACM ADBCU AEBYY AEFXT AEJOY AENSD AFWIH AFWXC AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS ASPBG AVWKF BDXCO CCLIF EBS EJD GUFHI LHSKQ NPM ROL 7X8 |
| ID | FETCH-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2329-4957 |
| IngestDate | Thu Oct 02 11:33:26 EDT 2025 Mon Jul 21 05:55:40 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | memory allocation buffer management Computing methodologies → Massively parallel algorithms random algorithm GPU parallel computing Shared memory algorithms |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://dl.acm.org/doi/pdf/10.1145/3701623 |
| PMID | 39990623 |
| PQID | 3170275866 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_3170275866 pubmed_primary_39990623 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-02-11 |
| PublicationDateYYYYMMDD | 2025-02-11 |
| PublicationDate_xml | – month: 02 year: 2025 text: 2025-02-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | ACM transactions on parallel computing |
| PublicationTitleAlternate | ACM Trans Parallel Comput |
| PublicationYear | 2025 |
| SSID | ssj0001257689 |
| Score | 2.2824981 |
| Snippet | Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| Title | Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/39990623 https://www.proquest.com/docview/3170275866 |
| Volume | 12 |
| WOSCitedRecordID | wos001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LS8NAEMcHtR68WN_WFyt4DW2yjyRexFfxYgmi0FvZZHehUJPatILf3pkkpV4EwUsgh4UwmZn9787u_ACuQi14ILXzDKp9TwgnvDijZBjhANNzsfGzCjYRDgbRcBgnzYZb2RyrXObEKlGbIqM98i7Oc1Rhi5S6mX54RI2i6mqD0FiHFkcpQ14dDqMfeyykpuOKLxcQTE2G9b1ZXATILg9R7hCm6DdpWU0x_fZ_P24HthtxyW5rb9iFNZvvQXsJbmBNHO9D8lBz6NndgvAobHUGho1zfCvpTPvkiyV6RqiVCWv6ml8z9CqWEFiNFY696NwU75QrD-Ct__h6_-Q1aAVPc-nPPWuEjq2zXCsZWJumwk9DjH7lqL-dDHRqORUMlYi0EpnSoVCRsyZLRdrjqCoOYSMvcnsMzMWoISKJwaZjataW4iLEzwznkgvJnd-By6XNRui6VI_QuS0W5WhltQ4c1YYfTeseGyP8mdRBmZ_8YfQpbAVE5SVMi38GLYeBa89hM_ucj8vZReUT-Bwkz98pkcAS |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+Buffer+Management+in+Massively+Parallel+Systems%3A+The+Power+of+Randomness&rft.jtitle=ACM+transactions+on+parallel+computing&rft.au=Pham%2C+Minh&rft.au=Yuan%2C+Yongke&rft.au=Li%2C+Hao&rft.au=Mou%2C+Chengcheng&rft.date=2025-02-11&rft.issn=2329-4957&rft.eissn=2329-4957&rft.volume=12&rft.issue=1&rft_id=info:doi/10.1145%2F3701623&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-4957&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-4957&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-4957&client=summon |