BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads

OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunit...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings / International Conference on Parallel Architectures and Compilation Techniques s. 29 - 42
Hlavní autori:	Iwasaki, Shintaro, Amer, Abdelhalim, Taura, Kenjiro, Seo, Sangmin, Balaji, Pavan
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.09.2019
Predmet:	Concurrency control Fasteners Libraries Multithreading OpenMP Parallel processing Runtime Runtime Systems Software Task analysis User Level Threads
ISSN:	2641-7936
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism.
AbstractList	OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism.
Author	Taura, Kenjiro Balaji, Pavan Iwasaki, Shintaro Amer, Abdelhalim Seo, Sangmin
Author_xml	– sequence: 1 givenname: Shintaro surname: Iwasaki fullname: Iwasaki, Shintaro organization: The University of Tokyo – sequence: 2 givenname: Abdelhalim surname: Amer fullname: Amer, Abdelhalim organization: Argonne National Laboratory – sequence: 3 givenname: Kenjiro surname: Taura fullname: Taura, Kenjiro organization: The University of Tokyo – sequence: 4 givenname: Sangmin surname: Seo fullname: Seo, Sangmin organization: Argonne National Laboratory – sequence: 5 givenname: Pavan surname: Balaji fullname: Balaji, Pavan organization: Argonne National Laboratory
BookMark	eNotT01Lw0AUXEXBpnr24CV_IPG93WY368ka_IJIgqTgrWw2L-1KmpZsUPTXG1EYmGFmGJiAnfT7nhi7RIgRQV-Xy6yKOaCOAQDxiAWoeIpCong7ZjMuFxgpLeQZC7x_B1igTMSM3d4VeXUTFofR7dy36zeTpP6lDEszmK6jLnyljdv3Pvx04zZceRqinD4mv9oOZBp_zk5b03m6-Oc5Wz3cV9lTlBePz9kyjwxXyRhZDWQbZa2uW9QKgNCAoAlKNlP0-8EYbGpIGyPahIO0aipaq3Sd8EbM2dXfriOi9WFwOzN8rdNUo-Sp-AEIAUme
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/PACT.2019.00011
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	172813613X 9781728136134
EISSN	2641-7936
EndPage	42
ExternalDocumentID	8891628
Genre	orig-research
GroupedDBID	123 23M 29O 6IE 6IL ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIL RNS
ID	FETCH-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3
IEDL.DBID	RIE
ISICitedReferencesCount	12
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:43:19 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3
PageCount	14
ParticipantIDs	ieee_primary_8891628
PublicationCentury	2000
PublicationDate	2019-Sept.
PublicationDateYYYYMMDD	2019-09-01
PublicationDate_xml	– month: 09 year: 2019 text: 2019-Sept.
PublicationDecade	2010
PublicationTitle	Proceedings / International Conference on Parallel Architectures and Compilation Techniques
PublicationTitleAbbrev	PACT
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0041653 ssib057737306
Score	2.1918356
Snippet	OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP...
SourceID	ieee
SourceType	Publisher
StartPage	29
SubjectTerms	Concurrency control Fasteners Libraries Multithreading OpenMP Parallel processing Runtime Runtime Systems Software Task analysis User Level Threads
Title	BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
URI	https://ieeexplore.ieee.org/document/8891628
WOSCitedRecordID	wos000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MhNpOHcdMQEXFUNoIpVK3yrEdqRK0KG0Z-PWc07QwsCBliBIP9vnOPp_f3QO4dgIVhwsbuDJaxawOdIiGJw3Lsoh2uzzXJdmEHA7jyUQlNbjZ5cI450rwmbv1r-Vdvl2YtQ-VdeIYnRke16EuZbTJ1drqjpAyRGWNtqsw-hkirEr5MKo6yUMv9UAuX52SerqgX1wq5VbSb_6vEwfQ_snJI8lutzmEmpsfQXNLykAqG23B_eNokN6RES4F77MvbEo8ZuQlIYkuPG_KG3l1HoO8JD4ES8aogsHAI4dIitOq7bIN4_5T2nsOKpqEQHMpVoFR1BkrjVFZzpSk1DFNQ4ePjCz-8uPXmtmMxlaHueA0MhIbGiNVJrgNj6ExX8zdCRA8DRrluOQmRssW6DviaI3IKdfoOHTZKbS8QKYfm0oY00oWZ39_Pod9L_ENIusCGqti7S5hz3yuZsviqpy-b9g1mOw
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24NHJ11H182TSiQYByxmJNxI13YJiYKB4cG_3tcx0IMXkx2WrYf29b329fV77wO4NhwVh3HtmCJa5WrpSA8NTyg3TX3aarFMFmQTot8PRqMwrsDNJhfGGFOAz8ytfS3u8vVMLW2orBkE6MywYAu2LXNWma211h4uhIfq6q_XYfQ0uFcW83Fp2Iwf2omFctn6lNQSBv1iUyk2k07tf93Yh8ZPVh6JN_vNAVTM9BBqa1oGUlppHe4fB1FyRwa4GLxPvrApsaiRXkxiObfMKW_k1VgU8oLYICwZohI6kcUOkQQnVupFA4adp6TddUqiBEcywXNHhdQoLZQK08wNBaXGldQz-Ahf4y87fildndJASy_jjPpKYEOlRJhypr0jqE5nU3MMBM-DKjRMMBWgbXP0HnG0imeUSXQdWu4J1K1Axh-rWhjjUhanf3--gt1u0ovG0XP_5Qz2rPRX-KxzqObzpbmAHfWZTxbzy2IqvwEDJpw1
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=BOLT%3A+Optimizing+OpenMP+Parallel+Regions+with+User-Level+Threads&rft.au=Iwasaki%2C+Shintaro&rft.au=Amer%2C+Abdelhalim&rft.au=Taura%2C+Kenjiro&rft.au=Seo%2C+Sangmin&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=29&rft.epage=42&rft_id=info:doi/10.1109%2FPACT.2019.00011&rft.externalDocID=8891628