MPI detach — Towards automatic asynchronous local completion

When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Parallel computing Ročník 109; s. 102859
Hlavní autoři:	Protze, Joachim, Hermanns, Marc-André, Müller, Matthias S., Nguyen, Van Man, Jaeger, Julien, Saillard, Emmanuelle, Carribault, Patrick, Barthou, Denis
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.03.2022 Elsevier
Témata:	Asynchronous communication Code transformation Computer Science Distributed, Parallel, and Cluster Computing Hybrid parallelism Message Passing Interface OpenMP tasking Static analysis Asynchronous communication OpenMP tasking Hybrid parallelism Message Passing Interface Code transformation Static analysis
ISSN:	0167-8191, 1872-7336
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C++. Using our concept allows to span task dependency graphs for asynchronous execution over the full distributed memory application. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension. •MPI interface extensions to transfer request completion back to the MPI library.•callback-driven notification of asynchronous completion back to the application.•prototype implementation of the interface independent of the MPI implementation.•integration of MPI communication into OpenMP task programming.•compile-time analysis to convert blocking communication into non-blocking.
AbstractList	When aiming for large scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C ++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C ++. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension. When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C++. Using our concept allows to span task dependency graphs for asynchronous execution over the full distributed memory application. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension. •MPI interface extensions to transfer request completion back to the MPI library.•callback-driven notification of asynchronous completion back to the application.•prototype implementation of the interface independent of the MPI implementation.•integration of MPI communication into OpenMP task programming.•compile-time analysis to convert blocking communication into non-blocking.
ArticleNumber	102859
Author	Protze, Joachim Müller, Matthias S. Barthou, Denis Hermanns, Marc-André Carribault, Patrick Saillard, Emmanuelle Nguyen, Van Man Jaeger, Julien
Author_xml	– sequence: 1 givenname: Joachim orcidid: 0000-0003-0640-8966 surname: Protze fullname: Protze, Joachim email: protze@itc.rwth-aachen.de organization: RWTH Aachen University, ITC, Seffenter Weg 23, Aachen, 52074, Germany – sequence: 2 givenname: Marc-André surname: Hermanns fullname: Hermanns, Marc-André email: hermanns@itc.rwth-aachen.de organization: RWTH Aachen University, ITC, Seffenter Weg 23, Aachen, 52074, Germany – sequence: 3 givenname: Matthias S. surname: Müller fullname: Müller, Matthias S. email: mueller@itc.rwth-aachen.de organization: RWTH Aachen University, ITC, Seffenter Weg 23, Aachen, 52074, Germany – sequence: 4 givenname: Van Man surname: Nguyen fullname: Nguyen, Van Man email: van-man.nguyen.ocre@cea.fr organization: CEA, DAM, DIF, Arpajon, F-91297, France – sequence: 5 givenname: Julien orcidid: 0000-0003-0084-1574 surname: Jaeger fullname: Jaeger, Julien email: julien.jaeger@cea.fr organization: CEA, DAM, DIF, Arpajon, F-91297, France – sequence: 6 givenname: Emmanuelle surname: Saillard fullname: Saillard, Emmanuelle email: emmanuelle.saillard@inria.fr organization: Inria, 200 avenue de la vieille tour, Talence, 33400, France – sequence: 7 givenname: Patrick surname: Carribault fullname: Carribault, Patrick email: patrick.carribault@cea.fr organization: CEA, DAM, DIF, Arpajon, F-91297, France – sequence: 8 givenname: Denis surname: Barthou fullname: Barthou, Denis email: denis.barthou@inria.fr organization: Inria, 200 avenue de la vieille tour, Talence, 33400, France
BackLink	https://cea.hal.science/cea-03537990$$DView record in HAL
BookMark	eNp9kM9KAzEQh4NUsK0-gZe9etiaP-1mc1AoRW2hood6DrOTLE3ZbkqyrfTmQ_iEPolbVzwKAwPD7xtmvgHp1b62hFwzOmKUZbeb0Q4C-hGnnLUTnk_UGemzXPJUCpH1SL9NyTRnil2QQYwbSmk2zmmf3D-_LhJjG8B18vXxmaz8OwQTE9g3fguNwwTiscZ18LXfx6TyCFWCfrurbON8fUnOS6iivfrtQ_L2-LCazdPly9NiNl2mKARv0gIMKyUUgpUgRHuXbYuDUXmZjRmXGRRjBFQZWp4VKjdKcmUks4VFYcZGDMlNt3cNld4Ft4Vw1B6cnk-XGi1oKiZCKkUPrM2KLovBxxhs-Qcwqk-69Eb_6NInXbrT1VJ3HWXbNw7OBh3R2RqtccFio413__LfAaN2-w
Cites_doi	10.1177/1094342014548772 10.1016/j.jpdc.2019.12.005 10.1145/3127024.3127033
ContentType	Journal Article
Copyright	2021 Elsevier B.V. Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: 2021 Elsevier B.V. – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	AAYXX CITATION 1XC VOOES
DOI	10.1016/j.parco.2021.102859
DatabaseName	CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-7336
ExternalDocumentID	oai:HAL:cea-03537990v1 10_1016_j_parco_2021_102859 S0167819121001022
GroupedDBID	--K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 1XC VOOES
ID	FETCH-LOGICAL-c332t-bad1f7ab31fa33187e87e2ad98f641276ab4cac96ce26b98d9729d71ebec3d4d3
ISICitedReferencesCount	3
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000744183200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0167-8191
IngestDate	Tue Oct 14 20:44:30 EDT 2025 Sat Nov 29 07:22:52 EST 2025 Fri Feb 23 02:41:56 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Asynchronous communication OpenMP tasking Hybrid parallelism Message Passing Interface Code transformation Static analysis
Language	English
License	Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c332t-bad1f7ab31fa33187e87e2ad98f641276ab4cac96ce26b98d9729d71ebec3d4d3
ORCID	0000-0003-0084-1574 0000-0003-0640-8966
OpenAccessLink	https://cea.hal.science/cea-03537990
ParticipantIDs	hal_primary_oai_HAL_cea_03537990v1 crossref_primary_10_1016_j_parco_2021_102859 elsevier_sciencedirect_doi_10_1016_j_parco_2021_102859
PublicationCentury	2000
PublicationDate	March 2022 2022-03-00 2022-03
PublicationDateYYYYMMDD	2022-03-01
PublicationDate_xml	– month: 03 year: 2022 text: March 2022
PublicationDecade	2020
PublicationTitle	Parallel computing
PublicationYear	2022
Publisher	Elsevier B.V Elsevier
Publisher_xml	– name: Elsevier B.V – name: Elsevier
References	Nguyen (b15) 2020 H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer, Transforming Blocking MPI Collectives to Non-Blocking and Persistent Operations, in: Proceedings of the 24th European MPI Users’ Group Meeting, 2017, pp. 1–11. Grant, Dosanjh, Levenhagen, Brightwell, Skjellum (b3) 2019 Dinan, Grant, Balaji, Goodell, Miller, Snir, Thakur (b2) 2014; 28 Sala, Bellón, Farré, Teruel, Pérez, Peña, Holmes, Beltran, Labarta (b7) 2018 Baker (b10) 2017 Laguna, Marshall, Mohror, Ruefenacht, Skjellum, Sultana (b13) 2019 Lattner, Adve (b16) 2004 Protze, Hermanns, Demiralp, Müller, Kuhlen (b9) 2020 Schuchart, Tsugane, Gracia, Sato (b5) 2018 Forum (b1) 2015 Hermanns, Geimer, Mohr, Wolf (b11) 2017 Klinkenberg, Samfass, Bader, Terboven, Müller (b6) 2020; 138 OpenM.P. Architecture Review Board (b4) 2018 Kumar (b12) 2008 Lührs, Rohe, Schnurpfeil, Thust, Frings (b18) 2016; Vol. 27 Sala, Teruel, Pérez, Peña, Beltran, Labarta (b8) 2019 Wagner, López, Morillo, Cavazzoni, Affinito, Giménez, Labarta (b17) 2017 Lattner (10.1016/j.parco.2021.102859_b16) 2004 Klinkenberg (10.1016/j.parco.2021.102859_b6) 2020; 138 Dinan (10.1016/j.parco.2021.102859_b2) 2014; 28 Schuchart (10.1016/j.parco.2021.102859_b5) 2018 Grant (10.1016/j.parco.2021.102859_b3) 2019 Hermanns (10.1016/j.parco.2021.102859_b11) 2017 Protze (10.1016/j.parco.2021.102859_b9) 2020 Laguna (10.1016/j.parco.2021.102859_b13) 2019 Kumar (10.1016/j.parco.2021.102859_b12) 2008 Forum (10.1016/j.parco.2021.102859_b1) 2015 Sala (10.1016/j.parco.2021.102859_b7) 2018 OpenM.P. Architecture Review Board (10.1016/j.parco.2021.102859_b4) 2018 Wagner (10.1016/j.parco.2021.102859_b17) 2017 Lührs (10.1016/j.parco.2021.102859_b18) 2016; Vol. 27 10.1016/j.parco.2021.102859_b14 Baker (10.1016/j.parco.2021.102859_b10) 2017 Nguyen (10.1016/j.parco.2021.102859_b15) 2020 Sala (10.1016/j.parco.2021.102859_b8) 2019
References_xml	– start-page: 97 year: 2017 end-page: 114 ident: b11 article-title: Trace-based detection of lock contention in MPI one-sided communication publication-title: Tools for High Performance Computing, Vol. 2016 – year: 2019 ident: b8 article-title: Integrating blocking and non-blocking MPI primitives with task-based programming models – start-page: 94 year: 2008 end-page: 103 ident: b12 article-title: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer publication-title: Proc. of the 22nd Annual Intl. Conf. on Supercomputing, Vol. 2008 – start-page: 31:1 year: 2019 end-page: 31:14 ident: b13 article-title: A large-scale study of MPI usage in open-source HPC applications publication-title: SC – start-page: 75 year: 2004 end-page: 86 ident: b16 article-title: LLVM: A compilation framework for lifelong program analysis & transformation publication-title: International Symposium on Code Generation and Optimization, Vol. 2004 – start-page: 3 year: 2018 end-page: 17 ident: b5 article-title: The impact of taskyield on the design of tasks communicating through MPI publication-title: Evolving OpenMP for Evolving Architectures - Proc. of the 14th Intl. Workshop on OpenMP – year: 2018 ident: b4 article-title: OpenMP application program interface version 5.0 – volume: 138 start-page: 55 year: 2020 end-page: 64 ident: b6 article-title: CHAMELEON: reactive load balancing for hybrid MPI+OpenMP task-parallel applications publication-title: J. Parallel Distrib. Comput. – volume: 28 start-page: 390 year: 2014 end-page: 405 ident: b2 article-title: Enabling communication concurrency through flexible MPI endpoints publication-title: Int. J. Supercomput. Appl. High Perform. Comput. – year: 2017 ident: b10 article-title: OpenSHMEM specification 1.4 – reference: H. Ahmed, A. Skjellum, P. Bangalore, P. Pirkelbauer, Transforming Blocking MPI Collectives to Non-Blocking and Persistent Operations, in: Proceedings of the 24th European MPI Users’ Group Meeting, 2017, pp. 1–11. – volume: Vol. 27 start-page: 431 year: 2016 end-page: 438 ident: b18 article-title: Flexible and generic workflow management publication-title: Parallel Computing: On the Road to Exascale Intl. Conf. on Parallel Computing 2015, Edinburgh (United Kingdom), 1 Sep 2015 - 4 Sep 2015 – year: 2015 ident: b1 article-title: MPI: A message-passing interface standard, version 3.1 – start-page: 6:1 year: 2018 end-page: 6:11 ident: b7 article-title: Improving the interoperability between MPI and task-based programming models publication-title: Proc. of the 25th European MPI Users’ Group Meeting, Vol. 2018 – start-page: 330 year: 2019 end-page: 350 ident: b3 article-title: Finepoints: Partitioned multithreaded MPI communication publication-title: High Performance Computing - 34th Intl. Conf., ISC High Performance 2019, Frankfurt/Main, Germany, June 16-20, 2019, Proc. – start-page: 243 year: 2017 end-page: 250 ident: b17 article-title: Performance analysis and optimization of the FFTXlib on the intel knights landing architecture publication-title: ICPP Workshops – year: 2020 ident: b15 article-title: Automatic code motion to extend MPI nonblocking overlap window publication-title: High Performance Computing. ISC High Performance. Lecture Notes in Computer Science, Vol. 12321 – start-page: 71 year: 2020 end-page: 80 ident: b9 article-title: MPI detach - asynchronous local completion publication-title: EuroMPI – start-page: 71 year: 2020 ident: 10.1016/j.parco.2021.102859_b9 article-title: MPI detach - asynchronous local completion – start-page: 3 year: 2018 ident: 10.1016/j.parco.2021.102859_b5 article-title: The impact of taskyield on the design of tasks communicating through MPI – start-page: 6:1 year: 2018 ident: 10.1016/j.parco.2021.102859_b7 article-title: Improving the interoperability between MPI and task-based programming models – year: 2017 ident: 10.1016/j.parco.2021.102859_b10 – year: 2015 ident: 10.1016/j.parco.2021.102859_b1 – volume: 28 start-page: 390 issue: 4 year: 2014 ident: 10.1016/j.parco.2021.102859_b2 article-title: Enabling communication concurrency through flexible MPI endpoints publication-title: Int. J. Supercomput. Appl. High Perform. Comput. doi: 10.1177/1094342014548772 – year: 2018 ident: 10.1016/j.parco.2021.102859_b4 – start-page: 243 year: 2017 ident: 10.1016/j.parco.2021.102859_b17 article-title: Performance analysis and optimization of the FFTXlib on the intel knights landing architecture – start-page: 31:1 year: 2019 ident: 10.1016/j.parco.2021.102859_b13 article-title: A large-scale study of MPI usage in open-source HPC applications – start-page: 94 year: 2008 ident: 10.1016/j.parco.2021.102859_b12 article-title: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer – start-page: 330 year: 2019 ident: 10.1016/j.parco.2021.102859_b3 article-title: Finepoints: Partitioned multithreaded MPI communication – year: 2020 ident: 10.1016/j.parco.2021.102859_b15 article-title: Automatic code motion to extend MPI nonblocking overlap window – year: 2019 ident: 10.1016/j.parco.2021.102859_b8 – start-page: 75 year: 2004 ident: 10.1016/j.parco.2021.102859_b16 article-title: LLVM: A compilation framework for lifelong program analysis & transformation – volume: 138 start-page: 55 year: 2020 ident: 10.1016/j.parco.2021.102859_b6 article-title: CHAMELEON: reactive load balancing for hybrid MPI+OpenMP task-parallel applications publication-title: J. Parallel Distrib. Comput. doi: 10.1016/j.jpdc.2019.12.005 – ident: 10.1016/j.parco.2021.102859_b14 doi: 10.1145/3127024.3127033 – volume: Vol. 27 start-page: 431 year: 2016 ident: 10.1016/j.parco.2021.102859_b18 article-title: Flexible and generic workflow management – start-page: 97 year: 2017 ident: 10.1016/j.parco.2021.102859_b11 article-title: Trace-based detection of lock contention in MPI one-sided communication
SSID	ssj0006480
Score	2.33202
Snippet	When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high... When aiming for large scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high...
SourceID	hal crossref elsevier
SourceType	Open Access Repository Index Database Publisher
StartPage	102859
SubjectTerms	Asynchronous communication Code transformation Computer Science Distributed, Parallel, and Cluster Computing Hybrid parallelism Message Passing Interface OpenMP tasking Static analysis
Title	MPI detach — Towards automatic asynchronous local completion
URI	https://dx.doi.org/10.1016/j.parco.2021.102859 https://cea.hal.science/cea-03537990
Volume	109
WOSCitedRecordID	wos000744183200007&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 0167-8191 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6lKQcu5S1aHlohbsZR7F17vRekUBW1qFSRCCi31WZt01bgRIkTWk78CH4hv4TZZ1oqKnpAiqzETsarnS-zs-NvZhB6WfCsKEiZx3WaspgqKmNY5Wic1UyBuw-IUqaI6yE7OirGYz7sdL75XJjVF9Y0xdkZn_1XVcM5ULZOnb2BuoNQOAHvQelwBLXD8Z8U_354EGliqDqOPJOBRiPDjl1EctlOXZHWxXmjdGVczYE1K5qll1dBU85lHcq57rdiLy9bv9Rpczqftt9dDF5zMr-uA6s6FcEFsmF6Y02btI_kg4L1xze7PhHRtB0_kYvoQy-Epz8vz61N_KQfKDkUuwAF7G0DQ8vHLMEW633hJaPb5xfMpvZybGHwKxbdBhdOezMYrU7WTJPe-tuX62f_sa4FtqEnsp0KI0RoIcIK2UCbKct40UWbg4O98buwiOfUNN0LY_cFqww18MpY_ubUbBz78LxxV0Z30ZbbZ-CBxcc91Kma--iO7-GBnUl_gF4DXLCFC_714yd2QMEBKPgiULABCl4D5SH6-HZvtLsfu54asSIkbeOJLJOayQlJaknAnrMKXqkseVHnNElZLidUScV1o7h8wouSw-6rZIn-r5OSluQR6jbTpnqMMHiWNEtlAi41gU1-LTlYcxBQq34N8sg2euUnRcxs6RRxjSq2Ue4nTjjvz3p1AqBw_Q9fwDSHW-h66fuDQ6EqKfokIwz8rVWyc7PBPEG311B-irrtfFk9Q7fUqj1ZzJ87sPwGa_2EMQ
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MPI+detach+%E2%80%94+Towards+automatic+asynchronous+local+completion&rft.jtitle=Parallel+computing&rft.au=Protze%2C+Joachim&rft.au=Hermanns%2C+Marc-Andr%C3%A9&rft.au=M%C3%BCller%2C+Matthias+S.&rft.au=Nguyen%2C+Van+Man&rft.date=2022-03-01&rft.issn=0167-8191&rft.volume=109&rft.spage=102859&rft_id=info:doi/10.1016%2Fj.parco.2021.102859&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2021_102859
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon