Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness

Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of p...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:ACM transactions on parallel computing Ročník 12; číslo 1
Hlavní autoři: Pham, Minh, Yuan, Yongke, Li, Hao, Mou, Chengcheng, Tu, Yicheng, Xu, Zichen, Meng, Jinghan
Médium: Journal Article
Jazyk:angličtina
Vydáno: 11.02.2025
Témata:
ISSN:2329-4957, 2329-4957
On-line přístup:Zjistit podrobnosti o přístupu
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.
AbstractList Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.
Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique challenges associated with developing system software for massively parallel hardware to support numerous parallel threads efficiently are of paramount importance. One such challenge is the design of a dynamic memory allocator to allocate memory at runtime. Traditionally, memory allocators have relied on maintaining a global data structure, such as a queue of free pages. However, in the context of massively parallel systems, accessing such global data structures can quickly become a bottleneck even with multiple queues in place. This paper presents a novel approach to dynamic memory allocation that eliminates the need for a centralized data structure. Our proposed approach revolves around letting threads employ random search procedures to locate free pages. Through mathematical proofs and extensive experiments, we demonstrate that the basic random search design achieves lower latency than the best-known existing solution in most situations. Furthermore, we develop more advanced techniques and algorithms to tackle the challenge of warp divergence and further enhance performance when free memory is limited. Building upon these advancements, our mathematical proofs and experimental results affirm that these advanced designs can yield an order of magnitude improvement over the basic design and consistently outperform the state-of-the-art by up to two orders of magnitude. To illustrate the practical implications of our work, we integrate our memory management techniques into two GPU algorithms: a hash join and a group-by. Both case studies provide compelling evidence of our approach's pronounced performance gains.
Author Xu, Zichen
Li, Hao
Pham, Minh
Meng, Jinghan
Yuan, Yongke
Tu, Yicheng
Mou, Chengcheng
Author_xml – sequence: 1
  givenname: Minh
  surname: Pham
  fullname: Pham, Minh
  organization: University of South Florida, USA
– sequence: 2
  givenname: Yongke
  surname: Yuan
  fullname: Yuan, Yongke
  organization: Beijing University of Technology, China
– sequence: 3
  givenname: Hao
  surname: Li
  fullname: Li, Hao
  organization: University of South Florida, USA
– sequence: 4
  givenname: Chengcheng
  surname: Mou
  fullname: Mou, Chengcheng
  organization: University of South Florida, USA
– sequence: 5
  givenname: Yicheng
  surname: Tu
  fullname: Tu, Yicheng
  organization: University of South Florida, USA
– sequence: 6
  givenname: Zichen
  surname: Xu
  fullname: Xu, Zichen
  organization: Nanchang University, China
– sequence: 7
  givenname: Jinghan
  surname: Meng
  fullname: Meng, Jinghan
  organization: University of South Florida, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/39990623$$D View this record in MEDLINE/PubMed
BookMark eNpNkE1Lw0AYhBep2FqL_0D26CW637vxpvUTKhat5_AmeaORZFOziZJ_b8AKnmYGnpnDHJKJbzwScszZGedKn0vLuBFyj8yEFHGkYm0n__yULEL4YIxxoa1x8QGZyjiO2ViZkfX14KEuM3rVFwW29BE8vGGNvqOlH1MI5RdWA11DC1WFFX0ZQod1uKCbd6Tr5nvsNAV9Bp83tccQjsh-AVXAxU7n5PX2ZrO8j1ZPdw_Ly1UEUvMuwlxBjAVKMFogpqniqWWWmcIZa7WAFKV2mhvlwKjMgFXGFZhnqUqZtE7Myenv7rZtPnsMXVKXIcOqAo9NHxLJLRNWO2NG9GSH9mmNebJtyxraIfl7QfwASide6g
ContentType Journal Article
DBID NPM
7X8
DOI 10.1145/3701623
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
EISSN 2329-4957
ExternalDocumentID 39990623
Genre Journal Article
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R01 GM140316
GroupedDBID 4.4
5VS
AAKMM
AALFJ
AAYFX
ACM
ADBCU
AEBYY
AEFXT
AEJOY
AENSD
AFWIH
AFWXC
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
ASPBG
AVWKF
BDXCO
CCLIF
EBS
EJD
GUFHI
LHSKQ
NPM
ROL
7X8
ID FETCH-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782
IEDL.DBID 7X8
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2329-4957
IngestDate Thu Oct 02 11:33:26 EDT 2025
Mon Jul 21 05:55:40 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords memory allocation
buffer management
Computing methodologies → Massively parallel algorithms
random algorithm
GPU
parallel computing
Shared memory algorithms
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a351t-ed4a9efe3a652eebb41b70706f867752abe35851648a64c6a7468fedcb4b03782
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://dl.acm.org/doi/pdf/10.1145/3701623
PMID 39990623
PQID 3170275866
PQPubID 23479
ParticipantIDs proquest_miscellaneous_3170275866
pubmed_primary_39990623
PublicationCentury 2000
PublicationDate 2025-02-11
PublicationDateYYYYMMDD 2025-02-11
PublicationDate_xml – month: 02
  year: 2025
  text: 2025-02-11
  day: 11
PublicationDecade 2020
PublicationTitle ACM transactions on parallel computing
PublicationTitleAlternate ACM Trans Parallel Comput
PublicationYear 2025
SSID ssj0001257689
Score 2.2824981
Snippet Massively parallel systems, such as Graphics Processing Units (GPUs), play an increasingly crucial role in today's data-intensive computing. The unique...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
Title Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness
URI https://www.ncbi.nlm.nih.gov/pubmed/39990623
https://www.proquest.com/docview/3170275866
Volume 12
WOSCitedRecordID wos001430995000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ1LS8NAEMcHtR68WN_WFyt4DW2yjyRexFfxYgmi0FvZZHehUJPatILf3pkkpV4EwUsgh4UwmZn9787u_ACuQi14ILXzDKp9TwgnvDijZBjhANNzsfGzCjYRDgbRcBgnzYZb2RyrXObEKlGbIqM98i7Oc1Rhi5S6mX54RI2i6mqD0FiHFkcpQ14dDqMfeyykpuOKLxcQTE2G9b1ZXATILg9R7hCm6DdpWU0x_fZ_P24HthtxyW5rb9iFNZvvQXsJbmBNHO9D8lBz6NndgvAobHUGho1zfCvpTPvkiyV6RqiVCWv6ml8z9CqWEFiNFY696NwU75QrD-Ct__h6_-Q1aAVPc-nPPWuEjq2zXCsZWJumwk9DjH7lqL-dDHRqORUMlYi0EpnSoVCRsyZLRdrjqCoOYSMvcnsMzMWoISKJwaZjataW4iLEzwznkgvJnd-By6XNRui6VI_QuS0W5WhltQ4c1YYfTeseGyP8mdRBmZ_8YfQpbAVE5SVMi38GLYeBa89hM_ucj8vZReUT-Bwkz98pkcAS
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+Buffer+Management+in+Massively+Parallel+Systems%3A+The+Power+of+Randomness&rft.jtitle=ACM+transactions+on+parallel+computing&rft.au=Pham%2C+Minh&rft.au=Yuan%2C+Yongke&rft.au=Li%2C+Hao&rft.au=Mou%2C+Chengcheng&rft.date=2025-02-11&rft.issn=2329-4957&rft.eissn=2329-4957&rft.volume=12&rft.issue=1&rft_id=info:doi/10.1145%2F3701623&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-4957&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-4957&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-4957&client=summon