A New Execution Model and Executor for Adaptively Optimizing the Performance of Parallel Algorithms Using HPX Runtime System

Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to write portable and uniform code. However, in some cases, they ma...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:SN computer science Ročník 6; číslo 8; s. 911
Hlavní autori: Mohammadiporshokooh, Karame, Brandt, Steven R., Kaiser, Hartmut
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Singapore Springer Nature Singapore 01.12.2025
Springer Nature B.V
Predmet:
ISSN:2661-8907, 2662-995X, 2661-8907
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to write portable and uniform code. However, in some cases, they may not fully leverage hardware capabilities or optimally allocate resources for specific workloads, leading to potential performance inefficiencies. Building on our earlier conference paper [Adaptively Optimizing the Performance of HPX's Parallel Algorithms], which introduced a preliminary strategy based on cores and chunking (workload), and integrated it into HPX’s executor API, that dynamically optimizes for workload distribution and resource allocation, based on runtime metrics and overheads, this paper, introduces a more detailed model of that strategy. It evaluates the efficiency of this implementation (as an HPX executor) across a wide range of compute-bound and memory-bound workloads on different architectures and with different algorithms. The results show consistent speedups across all tests, configurations, and workloads studied, offering improved performance through a familiar and user-friendly C ++ executor API. Additionally, the paper highlights how runtime-driven executor adaptations can simplify performance optimization without increasing the complexity of algorithm development.
AbstractList Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to write portable and uniform code. However, in some cases, they may not fully leverage hardware capabilities or optimally allocate resources for specific workloads, leading to potential performance inefficiencies. Building on our earlier conference paper [Adaptively Optimizing the Performance of HPX's Parallel Algorithms], which introduced a preliminary strategy based on cores and chunking (workload), and integrated it into HPX’s executor API, that dynamically optimizes for workload distribution and resource allocation, based on runtime metrics and overheads, this paper, introduces a more detailed model of that strategy. It evaluates the efficiency of this implementation (as an HPX executor) across a wide range of compute-bound and memory-bound workloads on different architectures and with different algorithms. The results show consistent speedups across all tests, configurations, and workloads studied, offering improved performance through a familiar and user-friendly C ++ executor API. Additionally, the paper highlights how runtime-driven executor adaptations can simplify performance optimization without increasing the complexity of algorithm development.
Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a standardized interface that simplifies the development process, allowing developers to write portable and uniform code. However, in some cases, they may not fully leverage hardware capabilities or optimally allocate resources for specific workloads, leading to potential performance inefficiencies. Building on our earlier conference paper [Adaptively Optimizing the Performance of HPX's Parallel Algorithms], which introduced a preliminary strategy based on cores and chunking (workload), and integrated it into HPX’s executor API, that dynamically optimizes for workload distribution and resource allocation, based on runtime metrics and overheads, this paper, introduces a more detailed model of that strategy. It evaluates the efficiency of this implementation (as an HPX executor) across a wide range of compute-bound and memory-bound workloads on different architectures and with different algorithms. The results show consistent speedups across all tests, configurations, and workloads studied, offering improved performance through a familiar and user-friendly C++ executor API. Additionally, the paper highlights how runtime-driven executor adaptations can simplify performance optimization without increasing the complexity of algorithm development.
ArticleNumber 911
Author Mohammadiporshokooh, Karame
Brandt, Steven R.
Kaiser, Hartmut
Author_xml – sequence: 1
  givenname: Karame
  orcidid: 0009-0000-8349-3389
  surname: Mohammadiporshokooh
  fullname: Mohammadiporshokooh, Karame
  email: kmoham6@lsu.edu
  organization: Center of Computation and Technology, Louisiana State University, Department of Computer Science, Louisiana State University
– sequence: 2
  givenname: Steven R.
  orcidid: 0000-0002-7979-2906
  surname: Brandt
  fullname: Brandt, Steven R.
  organization: Center of Computation and Technology, Louisiana State University, Department of Computer Science, Louisiana State University
– sequence: 3
  givenname: Hartmut
  orcidid: 0000-0002-8712-2806
  surname: Kaiser
  fullname: Kaiser, Hartmut
  organization: Center of Computation and Technology, Louisiana State University, Department of Computer Science, Louisiana State University
BookMark eNp9kE9LAzEQxYMoWGu_gKeA59Vs9v-xlGqFaota8BayyWy7ZTepya664oc3dQt68jDMMPzeG-adoWOlFSB04ZMrn5Dk2oY0SzKP0MgjYRhSrztCAxrHvpdmJDn-M5-ikbVbQhzqyDgaoK8xfoB3PP0A0TalVvheS6gwV_Kw0wYXrsaS75ryDaoOL9xQl5-lWuNmA3gJxgE1VwKwLvCSG15VzmJcrbUpm01t8cru4dnyBT-2yokBP3W2gfocnRS8sjA69CFa3UyfJzNvvri9m4znnqCB33mxiDIucz-XQSRJkoo8zVOSZqEUmXsEJIE8zkUcpjRJ8iLIOOQJiDAsoiQgkgdDdNn77ox-bcE2bKtbo9xJFtDYz9I0ooGjaE8Jo601ULCdKWtuOuYTtg-a9UEzlx77CZp1ThT0IutgtQbza_2P6hvw3ISh
Cites_doi 10.1007/s11227-017-2023-9
10.21105/joss.02352
10.1145/3624062.3624230
10.1109/HIPS.2004.1299190
10.1109/CLUSTER.2015.119
10.1145/3152041.3152084
10.1109/ICPPW.2009.14
10.1145/3620665.3640405
10.1007/978-3-031-97196-9_3
10.1145/3318170.3318191
10.1007/978-3-031-31209-0_1
10.1145/42411.42415
10.1007/978-3-031-41673-6_5
10.1145/1465482.1465560
10.1007/978-3-031-97196-9_6
10.1016/j.jocs.2020.101284
ContentType Journal Article
Copyright The Author(s) 2025
The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2025
– notice: The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
JQ2
DOI 10.1007/s42979-025-04442-y
DatabaseName SpringerOpen Free (Free internet resource, activated by CARLI)
CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList
ProQuest Computer Science Collection
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2661-8907
ExternalDocumentID 10_1007_s42979_025_04442_y
GroupedDBID 0R~
2JN
406
AACDK
AAHNG
AAJBT
AASML
AATNV
AAUYE
ABAKF
ABBRH
ABDBE
ABECU
ABFSG
ABHQN
ABJNI
ABMQK
ABRTQ
ABTEG
ABTKH
ABWNU
ACAOD
ACDTI
ACHSB
ACOKC
ACPIV
ACSTC
ACZOJ
ADKFA
ADKNI
ADTPH
ADYFF
AEFQL
AEMSY
AESKC
AEZWR
AFBBN
AFDZB
AFHIU
AFOHR
AFQWF
AGMZJ
AGQEE
AGRTI
AHPBZ
AHWEU
AIGIU
AILAN
AIXLP
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
AMXSW
AMYLF
ATHPR
AYFIA
BAPOH
BSONS
C6C
DPUIP
EBLON
EBS
FIGPU
FNLPD
GGCAI
GNWQR
IKXTQ
IWAJR
JZLTJ
LLZTM
NPVJJ
NQJWS
PT4
ROL
RSV
SJYHP
SNE
SOJ
SRMVM
SSLCW
UOJIU
UTJUX
ZMTXR
AAYXX
CITATION
KOV
JQ2
ID FETCH-LOGICAL-c231y-6c59adb1bd35d078cb8b80894dc9250ed0eb6bc648277bf39aeb7ec44f5730da3
IEDL.DBID RSV
ISSN 2661-8907
2662-995X
IngestDate Wed Nov 05 14:46:05 EST 2025
Sat Nov 29 07:06:07 EST 2025
Sat Oct 18 23:02:08 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Keywords HPX
Executors
Performance
Asynchronous many-task (AMT)
Parallel algorithms
Optimization
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c231y-6c59adb1bd35d078cb8b80894dc9250ed0eb6bc648277bf39aeb7ec44f5730da3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7979-2906
0000-0002-8712-2806
0009-0000-8349-3389
OpenAccessLink https://link.springer.com/10.1007/s42979-025-04442-y
PQID 3261988523
PQPubID 6623307
ParticipantIDs proquest_journals_3261988523
crossref_primary_10_1007_s42979_025_04442_y
springer_journals_10_1007_s42979_025_04442_y
PublicationCentury 2000
PublicationDate 2025-12-01
PublicationDateYYYYMMDD 2025-12-01
PublicationDate_xml – month: 12
  year: 2025
  text: 2025-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
– name: Kolkata
PublicationTitle SN computer science
PublicationTitleAbbrev SN COMPUT. SCI
PublicationYear 2025
Publisher Springer Nature Singapore
Springer Nature B.V
Publisher_xml – name: Springer Nature Singapore
– name: Springer Nature B.V
References 4442_CR15
4442_CR8
4442_CR16
S Höfinger (4442_CR23) 2017; 73
4442_CR17
4442_CR18
4442_CR11
4442_CR12
4442_CR13
4442_CR14
4442_CR1
4442_CR2
4442_CR5
4442_CR19
4442_CR4
4442_CR7
4442_CR6
A Eleliemy (4442_CR22) 2021; 51
H Kaiser (4442_CR9) 2020; 5
JL Gustafson (4442_CR3) 1988; 31
4442_CR20
4442_CR10
4442_CR21
References_xml – ident: 4442_CR15
– volume: 73
  start-page: 4390
  issue: 10
  year: 2017
  ident: 4442_CR23
  publication-title: J Supercomput
  doi: 10.1007/s11227-017-2023-9
– volume: 5
  start-page: 2352
  issue: 53
  year: 2020
  ident: 4442_CR9
  publication-title: J Open Source Softw
  doi: 10.21105/joss.02352
– ident: 4442_CR16
– ident: 4442_CR17
  doi: 10.1145/3624062.3624230
– ident: 4442_CR19
  doi: 10.1109/HIPS.2004.1299190
– ident: 4442_CR14
– ident: 4442_CR13
– ident: 4442_CR8
  doi: 10.1109/CLUSTER.2015.119
– ident: 4442_CR1
  doi: 10.1145/3152041.3152084
– ident: 4442_CR6
  doi: 10.1109/HIPS.2004.1299190
– ident: 4442_CR11
  doi: 10.1109/ICPPW.2009.14
– ident: 4442_CR20
  doi: 10.1145/3620665.3640405
– ident: 4442_CR21
– ident: 4442_CR5
  doi: 10.1007/978-3-031-97196-9_3
– ident: 4442_CR7
  doi: 10.1145/3318170.3318191
– ident: 4442_CR12
  doi: 10.1007/978-3-031-31209-0_1
– volume: 31
  start-page: 532
  issue: 5
  year: 1988
  ident: 4442_CR3
  publication-title: Commun ACM
  doi: 10.1145/42411.42415
– ident: 4442_CR10
– ident: 4442_CR18
  doi: 10.1007/978-3-031-41673-6_5
– ident: 4442_CR2
  doi: 10.1145/1465482.1465560
– ident: 4442_CR4
  doi: 10.1007/978-3-031-97196-9_6
– volume: 51
  year: 2021
  ident: 4442_CR22
  publication-title: J. Comput. Sci.
  doi: 10.1016/j.jocs.2020.101284
SSID ssj0002504465
Score 2.3108978
Snippet Developing parallel algorithms efficiently requires careful management of concurrency across diverse hardware architectures. C++ executors provide a...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Index Database
Publisher
StartPage 911
SubjectTerms Algorithms
Application programming interface
C plus plus
C++ (programming language)
Computer Imaging
Computer Science
Computer Systems Organization and Communication Networks
Data Structures and Information Theory
Hardware
Information Systems and Communication Service
Libraries
Optimization
Original Research
Pattern Recognition and Graphics
Resource allocation
Run time (computers)
Software Engineering/Programming and Operating Systems
Vision
Workload
Workloads
Title A New Execution Model and Executor for Adaptively Optimizing the Performance of Parallel Algorithms Using HPX Runtime System
URI https://link.springer.com/article/10.1007/s42979-025-04442-y
https://www.proquest.com/docview/3261988523
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 2661-8907
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002504465
  issn: 2661-8907
  databaseCode: RSV
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 2661-8907
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002504465
  issn: 2661-8907
  databaseCode: RSV
  dateStart: 20200101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgcODCjiib5sANLKWpk9jHCoF6QFCVRb1F3gKVuiktiCI-nrGbtALBAW6Rs1kz9swb2_OGkFMmTRiHoaBRLZaURUmdclMTtKZCEwWac2U8Zf51cnPDOx3RKpLCxuVp93JL0lvqebIbWs5EUFd-1XGchXS6TFbQ3XE3Hdt3j_OVFUfKxeKoyJD5-dWvXmgBLb_thnonc7Xxv-5tkvUCVEJjNgq2yJIdbJONsmADFPN3h3w0AI0aXL5Z7ccbuEpoPZADU7QNc0AQCw0jR84M9qZwixf97jv2BBAqQmuRZwDDDFoyd7VY8Ne9p2HenTz3x-APIUCz1YG2K0PRtzBjRd8lD1eX9xdNWpRfoBpB35TGOhLSqBpqKzKIJLTiigdcMKMFytiawKpY6dgRiSYqqwtpVWI1YxlqPDCyvkcqg-HA7hNAFMliKRJ0hhkzMpChypjQgmM7RmhhlZyV6khHM5aNdM6n7AWbomBTL9h0WiVHpcbSYsaN07oLBTnHuLpKzksNLW7__rWDvz1-SNZCp2R_ouWIVCb5iz0mq_p10h3nJ34kfgKNm9uH
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTxsxEB4VWqlcoC9EeLRz6K21tNl4d-1jhEBBpGlEaZXbyq-FSHmgTUAE8eMZO7uJWrWHclt5X9aMPfPZnvkG4DNXNk7jWLKkmSrGk6zFhG1K1tSxTSIjhLaBMr-b9XpiMJD9KilsVke710eSwVKvkt3IcmaS-fKrnuMsZosNeMnJY_lAvosfv1Y7K56Ui6dJlSHz91d_90JraPnHaWhwMqc7z-veG9iuQCW2l6PgLbxwk3ewUxdswGr-vofHNpJRw5N7Z8J4Q18JbYRqYqu2aYkEYrFt1Y03g6MFfqeL8fCBeoIEFbG_zjPAaYF9VfpaLPTr0dW0HM6vxzMMQQjY6Q_wwpehGDtcsqJ_gJ-nJ5fHHVaVX2CGQN-CpSaRyuomaSuxhCSMFlpEQnJrJMnY2cjpVJvUE4lmumhJ5XTmDOcFaTyyqrULm5PpxO0BEorkqZIZOcOCWxWpWBdcGimonVZocQO-1OrIb5YsG_mKTzkINifB5kGw-aIBh7XG8mrGzfKWXwoKQevqBnytNbS-_e-v7f_f45_gdefyWzfvnvXOD2Ar9goP0S2HsDkvb90RvDJ38-Gs_BhG5RPhIt5r
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9NAEB5BQYgLLRREoJQ5cKOrOs7a3j1G0KioVbB4KTdrXy6REidy0oogfjyzGzsB1B5Qb9b6tZoZ737j3fk-gDdc2TiNY8mSbqoYT7IeE7YrWVfHNomMENoGyvzzbDgUo5HM_6jiD7vd2yXJdU2DZ2mqlsdzWx5vCt9oFM0k81Ksnu8sZqu7cI970SCfr3_-tvnL4gm6eJo01TLX3_r3jLSFmf-sjIYJZ7B7-67uwaMGbGJ_HR2P4Y6rnsBuK-SAzXe9D7_6SIMdnvxwJsQheoW0CarKNm2zGgncYt-quR8eJyv8SAfT8U_qFRKExHxbf4CzEnNVe40WevXkYlaPl9-nCwybE_A0H-EnL08xdbhmS38KXwcnX96dskaWgRkCgyuWmkQqq7vkxcQSwjBaaBEJya2RZG9nI6dTbVJPMJrpsieV05kznJcUCZFVvWewU80q9xyQ0CVPlcxokiy5VZGKdcmlkYLaKXOLO_C2dU0xX7NvFBue5WDYggxbBMMWqw4ctN4rmi9xUfR8iigE5dsdOGq9tT1989Ne_N_lr-FB_n5QnH8Ynr2Eh7H3d9j0cgA7y_rSvYL75mo5XtSHIUB_A8P-508
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+New+Execution+Model+and+Executor+for+Adaptively+Optimizing+the+Performance+of+Parallel+Algorithms+Using+HPX+Runtime+System&rft.jtitle=SN+computer+science&rft.au=Mohammadiporshokooh%2C+Karame&rft.au=Brandt%2C+Steven+R.&rft.au=Kaiser%2C+Hartmut&rft.date=2025-12-01&rft.issn=2661-8907&rft.eissn=2661-8907&rft.volume=6&rft.issue=8&rft_id=info:doi/10.1007%2Fs42979-025-04442-y&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s42979_025_04442_y
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2661-8907&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2661-8907&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2661-8907&client=summon