BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads

OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunit...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / International Conference on Parallel Architectures and Compilation Techniques s. 29 - 42
Hlavní autori: Iwasaki, Shintaro, Amer, Abdelhalim, Taura, Kenjiro, Seo, Sangmin, Balaji, Pavan
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.09.2019
Predmet:
ISSN:2641-7936
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism.
AbstractList OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism.
Author Taura, Kenjiro
Balaji, Pavan
Iwasaki, Shintaro
Amer, Abdelhalim
Seo, Sangmin
Author_xml – sequence: 1
  givenname: Shintaro
  surname: Iwasaki
  fullname: Iwasaki, Shintaro
  organization: The University of Tokyo
– sequence: 2
  givenname: Abdelhalim
  surname: Amer
  fullname: Amer, Abdelhalim
  organization: Argonne National Laboratory
– sequence: 3
  givenname: Kenjiro
  surname: Taura
  fullname: Taura, Kenjiro
  organization: The University of Tokyo
– sequence: 4
  givenname: Sangmin
  surname: Seo
  fullname: Seo, Sangmin
  organization: Argonne National Laboratory
– sequence: 5
  givenname: Pavan
  surname: Balaji
  fullname: Balaji, Pavan
  organization: Argonne National Laboratory
BookMark eNotT01Lw0AUXEXBpnr24CV_IPG93WY368ka_IJIgqTgrWw2L-1KmpZsUPTXG1EYmGFmGJiAnfT7nhi7RIgRQV-Xy6yKOaCOAQDxiAWoeIpCong7ZjMuFxgpLeQZC7x_B1igTMSM3d4VeXUTFofR7dy36zeTpP6lDEszmK6jLnyljdv3Pvx04zZceRqinD4mv9oOZBp_zk5b03m6-Oc5Wz3cV9lTlBePz9kyjwxXyRhZDWQbZa2uW9QKgNCAoAlKNlP0-8EYbGpIGyPahIO0aipaq3Sd8EbM2dXfriOi9WFwOzN8rdNUo-Sp-AEIAUme
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PACT.2019.00011
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 172813613X
9781728136134
EISSN 2641-7936
EndPage 42
ExternalDocumentID 8891628
Genre orig-research
GroupedDBID 123
23M
29O
6IE
6IL
ACGFS
AFFNX
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIL
RNS
ID FETCH-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3
IEDL.DBID RIE
ISICitedReferencesCount 12
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:43:19 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3
PageCount 14
ParticipantIDs ieee_primary_8891628
PublicationCentury 2000
PublicationDate 2019-Sept.
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-Sept.
PublicationDecade 2010
PublicationTitle Proceedings / International Conference on Parallel Architectures and Compilation Techniques
PublicationTitleAbbrev PACT
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0041653
ssib057737306
Score 2.1918356
Snippet OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP...
SourceID ieee
SourceType Publisher
StartPage 29
SubjectTerms Concurrency control
Fasteners
Libraries
Multithreading
OpenMP
Parallel processing
Runtime
Runtime Systems
Software
Task analysis
User Level Threads
Title BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
URI https://ieeexplore.ieee.org/document/8891628
WOSCitedRecordID wos000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MhNpOHcdMQEXFUNoIpVK3yrEdqRK0KG0Z-PWc07QwsCBliBIP9vnOPp_f3QO4dgIVhwsbuDJaxawOdIiGJw3Lsoh2uzzXJdmEHA7jyUQlNbjZ5cI450rwmbv1r-Vdvl2YtQ-VdeIYnRke16EuZbTJ1drqjpAyRGWNtqsw-hkirEr5MKo6yUMv9UAuX52SerqgX1wq5VbSb_6vEwfQ_snJI8lutzmEmpsfQXNLykAqG23B_eNokN6RES4F77MvbEo8ZuQlIYkuPG_KG3l1HoO8JD4ES8aogsHAI4dIitOq7bIN4_5T2nsOKpqEQHMpVoFR1BkrjVFZzpSk1DFNQ4ePjCz-8uPXmtmMxlaHueA0MhIbGiNVJrgNj6ExX8zdCRA8DRrluOQmRssW6DviaI3IKdfoOHTZKbS8QKYfm0oY00oWZ39_Pod9L_ENIusCGqti7S5hz3yuZsviqpy-b9g1mOw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24NHJ11H182TSiQYByxmJNxI13YJiYKB4cG_3tcx0IMXkx2WrYf29b329fV77wO4NhwVh3HtmCJa5WrpSA8NTyg3TX3aarFMFmQTot8PRqMwrsDNJhfGGFOAz8ytfS3u8vVMLW2orBkE6MywYAu2LXNWma211h4uhIfq6q_XYfQ0uFcW83Fp2Iwf2omFctn6lNQSBv1iUyk2k07tf93Yh8ZPVh6JN_vNAVTM9BBqa1oGUlppHe4fB1FyRwa4GLxPvrApsaiRXkxiObfMKW_k1VgU8oLYICwZohI6kcUOkQQnVupFA4adp6TddUqiBEcywXNHhdQoLZQK08wNBaXGldQz-Ahf4y87fildndJASy_jjPpKYEOlRJhypr0jqE5nU3MMBM-DKjRMMBWgbXP0HnG0imeUSXQdWu4J1K1Axh-rWhjjUhanf3--gt1u0ovG0XP_5Qz2rPRX-KxzqObzpbmAHfWZTxbzy2IqvwEDJpw1
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=BOLT%3A+Optimizing+OpenMP+Parallel+Regions+with+User-Level+Threads&rft.au=Iwasaki%2C+Shintaro&rft.au=Amer%2C+Abdelhalim&rft.au=Taura%2C+Kenjiro&rft.au=Seo%2C+Sangmin&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=29&rft.epage=42&rft_id=info:doi/10.1109%2FPACT.2019.00011&rft.externalDocID=8891628