Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness

Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of p...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	ACM transactions on parallel computing Ročník 12; číslo 1
Hlavní autoři:	Pham, Minh, Yuan, Yongke, Li, Hao, Mou, Chengcheng, Tu, Yicheng, Xu, Zichen, Meng, Jinghan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	11.02.2025
Témata:	memory allocation buffer management Computing methodologies → Massively parallel algorithms random algorithm GPU parallel computing Shared memory algorithms
ISSN:	2329-4957, 2329-4957
On-line přístup:	Zjistit podrobnosti o přístupu
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.
AbstractList	Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains. Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.
Author	Xu, Zichen Li, Hao Pham, Minh Meng, Jinghan Yuan, Yongke Tu, Yicheng Mou, Chengcheng
Author_xml	– sequence: 1 givenname: Minh surname: Pham fullname: Pham, Minh organization: University of South Florida, USA – sequence: 2 givenname: Yongke surname: Yuan fullname: Yuan, Yongke organization: Beijing University of Technology, China – sequence: 3 givenname: Hao surname: Li fullname: Li, Hao organization: University of South Florida, USA – sequence: 4 givenname: Chengcheng surname: Mou fullname: Mou, Chengcheng organization: University of South Florida, USA – sequence: 5 givenname: Yicheng surname: Tu fullname: Tu, Yicheng organization: University of South Florida, USA – sequence: 6 givenname: Zichen surname: Xu fullname: Xu, Zichen organization: Nanchang University, China – sequence: 7 givenname: Jinghan surname: Meng fullname: Meng, Jinghan organization: University of South Florida, USA
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/39990623$$D View this record in MEDLINE/PubMed
BookMark	eNpNkE1Lw0AYhBep2FqL_0D26CW637vxpvUTKhat5_AmeaORZFOziZJ_b8AKnmYGnpnDHJKJbzwScszZGedKn0vLuBFyj8yEFHGkYm0n__yULEL4YIxxoa1x8QGZyjiO2ViZkfX14KEuM3rVFwW29BE8vGGNvqOlH1MI5RdWA11DC1WFFX0ZQod1uKCbd6Tr5nvsNAV9Bp83tccQjsh-AVXAxU7n5PX2ZrO8j1ZPdw_Ly1UEUvMuwlxBjAVKMFogpqniqWWWmcIZa7WAFKV2mhvlwKjMgFXGFZhnqUqZtE7Myenv7rZtPnsMXVKXIcOqAo9NHxLJLRNWO2NG9GSH9mmNebJtyxraIfl7QfwASide6g
ContentType	Journal Article
DBID	NPM 7X8
DOI	10.1145/3701623
DatabaseName	PubMed MEDLINE - Academic
DatabaseTitle	PubMed MEDLINE - Academic
DatabaseTitleList	PubMed MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database
DeliveryMethod	no_fulltext_linktorsrc
Discipline	Computer Science
EISSN	2329-4957
ExternalDocumentID	39990623
Genre	Journal Article
GrantInformation_xml	– fundername: NIGMS NIH HHS grantid: R01 GM140316
GroupedDBID	4.4 5VS AAKMM AALFJ AAYFX ACM ADBCU AEBYY AEFXT AEJOY AENSD AFWIH AFWXC AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS ASPBG AVWKF BDXCO CCLIF EBS EJD GUFHI LHSKQ NPM ROL 7X8
ID	FETCH-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782
IEDL.DBID	7X8
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	2329-4957
IngestDate	Thu Oct 02 11:33:26 EDT 2025 Mon Jul 21 05:55:40 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	memory allocation buffer management Computing methodologies → Massively parallel algorithms random algorithm GPU parallel computing Shared memory algorithms
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	https://dl.acm.org/doi/pdf/10.1145/3701623
PMID	39990623
PQID	3170275866
PQPubID	23479
ParticipantIDs	proquest_miscellaneous_3170275866 pubmed_primary_39990623
PublicationCentury	2000
PublicationDate	2025-02-11
PublicationDateYYYYMMDD	2025-02-11
PublicationDate_xml	– month: 02 year: 2025 text: 2025-02-11 day: 11
PublicationDecade	2020
PublicationTitle	ACM transactions on parallel computing
PublicationTitleAlternate	ACM Trans Parallel Comput
PublicationYear	2025
SSID	ssj0001257689
Score	2.2824981
Snippet	Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique...
SourceID	proquest pubmed
SourceType	Aggregation Database Index Database
Title	Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness
URI	https://www.ncbi.nlm.nih.gov/pubmed/39990623 https://www.proquest.com/docview/3170275866
Volume	12
WOSCitedRecordID	wos001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LS8NAEMcHtR68WN_WFyt4DW2yjyRexFfxYgmi0FvZZHehUJPatILf3pkkpV4EwUsgh4UwmZn9787u_ACuQi14ILXzDKp9TwgnvDijZBjhANNzsfGzCjYRDgbRcBgnzYZb2RyrXObEKlGbIqM98i7Oc1Rhi5S6mX54RI2i6mqD0FiHFkcpQ14dDqMfeyykpuOKLxcQTE2G9b1ZXATILg9R7hCm6DdpWU0x_fZ_P24HthtxyW5rb9iFNZvvQXsJbmBNHO9D8lBz6NndgvAobHUGho1zfCvpTPvkiyV6RqiVCWv6ml8z9CqWEFiNFY696NwU75QrD-Ct__h6_-Q1aAVPc-nPPWuEjq2zXCsZWJumwk9DjH7lqL-dDHRqORUMlYi0EpnSoVCRsyZLRdrjqCoOYSMvcnsMzMWoISKJwaZjataW4iLEzwznkgvJnd-By6XNRui6VI_QuS0W5WhltQ4c1YYfTeseGyP8mdRBmZ_8YfQpbAVE5SVMi38GLYeBa89hM_ucj8vZReUT-Bwkz98pkcAS
linkProvider	ProQuest
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+Buffer+Management+in+Massively+Parallel+Systems%3A+The+Power+of+Randomness&rft.jtitle=ACM+transactions+on+parallel+computing&rft.au=Pham%2C+Minh&rft.au=Yuan%2C+Yongke&rft.au=Li%2C+Hao&rft.au=Mou%2C+Chengcheng&rft.date=2025-02-11&rft.issn=2329-4957&rft.eissn=2329-4957&rft.volume=12&rft.issue=1&rft_id=info:doi/10.1145%2F3701623&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-4957&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-4957&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-4957&client=summon