BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads
OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunit...
Uložené v:
| Vydané v: | Proceedings / International Conference on Parallel Architectures and Compilation Techniques s. 29 - 42 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.09.2019
|
| Predmet: | |
| ISSN: | 2641-7936 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism. |
|---|---|
| AbstractList | OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP independently of one another, often leading to nested parallel regions. Although exploiting such nested parallelism is a potential opportunity for performance improvement, it often causes destructive performance with leading OpenMP runtimes because of their reliance on heavyweight OS-level threads. User-level threads (ULTs) are more lightweight alternatives but existing ULT-based runtimes suffer from several shortcomings: 1) thread management costs remain significant and outweigh the benefits from additional parallelism; 2) the shift to ULTs often hurts the more common flat parallelism case; and 3) absence of user control over thread-to-CPU binding, a critical feature on modern systems. This paper presents BOLT, a practical ULT-based OpenMP runtime system that efficiently supports both flat and nested parallelism. This is accomplished on three fronts: 1) advanced data reuse and thread synchronization strategies; 2) thread coordination that adapts to the level of oversubscription; and 3) an implementation of the modern OpenMP thread-to-CPU binding interface tailored to ULT-based runtimes. The result is a highly optimized runtime that transparently achieves similar performance compared with leading state-of-the-art widely used OpenMP runtimes under flat parallelism, while outperforming all existing runtimes under nested parallelism. |
| Author | Taura, Kenjiro Balaji, Pavan Iwasaki, Shintaro Amer, Abdelhalim Seo, Sangmin |
| Author_xml | – sequence: 1 givenname: Shintaro surname: Iwasaki fullname: Iwasaki, Shintaro organization: The University of Tokyo – sequence: 2 givenname: Abdelhalim surname: Amer fullname: Amer, Abdelhalim organization: Argonne National Laboratory – sequence: 3 givenname: Kenjiro surname: Taura fullname: Taura, Kenjiro organization: The University of Tokyo – sequence: 4 givenname: Sangmin surname: Seo fullname: Seo, Sangmin organization: Argonne National Laboratory – sequence: 5 givenname: Pavan surname: Balaji fullname: Balaji, Pavan organization: Argonne National Laboratory |
| BookMark | eNotT01Lw0AUXEXBpnr24CV_IPG93WY368ka_IJIgqTgrWw2L-1KmpZsUPTXG1EYmGFmGJiAnfT7nhi7RIgRQV-Xy6yKOaCOAQDxiAWoeIpCong7ZjMuFxgpLeQZC7x_B1igTMSM3d4VeXUTFofR7dy36zeTpP6lDEszmK6jLnyljdv3Pvx04zZceRqinD4mv9oOZBp_zk5b03m6-Oc5Wz3cV9lTlBePz9kyjwxXyRhZDWQbZa2uW9QKgNCAoAlKNlP0-8EYbGpIGyPahIO0aipaq3Sd8EbM2dXfriOi9WFwOzN8rdNUo-Sp-AEIAUme |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/PACT.2019.00011 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 172813613X 9781728136134 |
| EISSN | 2641-7936 |
| EndPage | 42 |
| ExternalDocumentID | 8891628 |
| Genre | orig-research |
| GroupedDBID | 123 23M 29O 6IE 6IL ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIL RNS |
| ID | FETCH-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:43:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-c90ecd7cc9bf19700e1a03e03e76d0ec1109aa1db08da3f5206c7197cc79b52d3 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_8891628 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Sept. |
| PublicationDateYYYYMMDD | 2019-09-01 |
| PublicationDate_xml | – month: 09 year: 2019 text: 2019-Sept. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / International Conference on Parallel Architectures and Compilation Techniques |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0041653 ssib057737306 |
| Score | 2.1918356 |
| Snippet | OpenMP is widely used by a number of applications, computational libraries, and runtime systems. As a result, multiple levels of the software stack use OpenMP... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 29 |
| SubjectTerms | Concurrency control Fasteners Libraries Multithreading OpenMP Parallel processing Runtime Runtime Systems Software Task analysis User Level Threads |
| Title | BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads |
| URI | https://ieeexplore.ieee.org/document/8891628 |
| WOSCitedRecordID | wos000550990200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MhNpOHcdMQEXFUNoIpVK3yrEdqRK0KG0Z-PWc07QwsCBliBIP9vnOPp_f3QO4dgIVhwsbuDJaxawOdIiGJw3Lsoh2uzzXJdmEHA7jyUQlNbjZ5cI450rwmbv1r-Vdvl2YtQ-VdeIYnRke16EuZbTJ1drqjpAyRGWNtqsw-hkirEr5MKo6yUMv9UAuX52SerqgX1wq5VbSb_6vEwfQ_snJI8lutzmEmpsfQXNLykAqG23B_eNokN6RES4F77MvbEo8ZuQlIYkuPG_KG3l1HoO8JD4ES8aogsHAI4dIitOq7bIN4_5T2nsOKpqEQHMpVoFR1BkrjVFZzpSk1DFNQ4ePjCz-8uPXmtmMxlaHueA0MhIbGiNVJrgNj6ExX8zdCRA8DRrluOQmRssW6DviaI3IKdfoOHTZKbS8QKYfm0oY00oWZ39_Pod9L_ENIusCGqti7S5hz3yuZsviqpy-b9g1mOw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24NHJ11H182TSiQYByxmJNxI13YJiYKB4cG_3tcx0IMXkx2WrYf29b329fV77wO4NhwVh3HtmCJa5WrpSA8NTyg3TX3aarFMFmQTot8PRqMwrsDNJhfGGFOAz8ytfS3u8vVMLW2orBkE6MywYAu2LXNWma211h4uhIfq6q_XYfQ0uFcW83Fp2Iwf2omFctn6lNQSBv1iUyk2k07tf93Yh8ZPVh6JN_vNAVTM9BBqa1oGUlppHe4fB1FyRwa4GLxPvrApsaiRXkxiObfMKW_k1VgU8oLYICwZohI6kcUOkQQnVupFA4adp6TddUqiBEcywXNHhdQoLZQK08wNBaXGldQz-Ahf4y87fildndJASy_jjPpKYEOlRJhypr0jqE5nU3MMBM-DKjRMMBWgbXP0HnG0imeUSXQdWu4J1K1Axh-rWhjjUhanf3--gt1u0ovG0XP_5Qz2rPRX-KxzqObzpbmAHfWZTxbzy2IqvwEDJpw1 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=BOLT%3A+Optimizing+OpenMP+Parallel+Regions+with+User-Level+Threads&rft.au=Iwasaki%2C+Shintaro&rft.au=Amer%2C+Abdelhalim&rft.au=Taura%2C+Kenjiro&rft.au=Seo%2C+Sangmin&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=29&rft.epage=42&rft_id=info:doi/10.1109%2FPACT.2019.00011&rft.externalDocID=8891628 |