Multi-level load balancing with an integrated runtime approach
The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, resulting in dynamic load imbalance. Load imbalance of any kind...
Uložené v:
| Vydané v: | 2018 18th IEEE ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) s. 31 - 40 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
Piscataway, NJ, USA
IEEE Press
01.05.2018
IEEE |
| Edícia: | ACM Conferences |
| Predmet: | |
| ISBN: | 1538658151, 9781538658154 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, resulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead.
In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI. |
|---|---|
| AbstractList | The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI. The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, resulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI. |
| Author | White, Sam Diener, Matthias Kale, Laxmikant Bak, Seonmyeong Menon, Harshitha |
| Author_xml | – sequence: 1 givenname: Seonmyeong surname: Bak fullname: Bak, Seonmyeong email: sbak5@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 2 givenname: Harshitha surname: Menon fullname: Menon, Harshitha email: harshitha@llnl.gov organization: Lawrence Livermore National Laboratory – sequence: 3 givenname: Sam surname: White fullname: White, Sam email: white67@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 4 givenname: Matthias surname: Diener fullname: Diener, Matthias email: mdiener@illinois.edu organization: University of Illinois at Urbana-Champaign – sequence: 5 givenname: Laxmikant surname: Kale fullname: Kale, Laxmikant email: kale@illinois.edu organization: University of Illinois at Urbana-Champaign |
| BookMark | eNqNkEFLwzAYhiMq6OZ-gHjJWehM0qZJLoJUnYOJIHoOX9MvW7RLR9sp_ns758Gjl-_l4314D8-IHMUmIiHnnE05Z-aqKGbP89upYFxPGRvuARlxmepcai754d_nhEy67m2ARK4zJs0puX7c1n1IavzAmtYNVLSEGqILcUk_Q7-iEGmIPS5b6LGi7Tb2YY0UNpu2Abc6I8ce6g4nvzkmr_d3L8VDsniazYubRQJC6T5xzIuMKQAlHHeZ9KlAUykNmEuRCYne-KpyDEojcy6Ycb7yJlXOoFO5d-mYXOx3AyLaTRvW0H5ZnQ0CmBray30Lbm3LpnnvLGd2J8fu5didHPsjZ4DZv2FbtgF9-g3nk2Zh |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CCGRID.2018.00018 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1538658151 9781538658154 |
| EndPage | 40 |
| ExternalDocumentID | 8411007 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL AAWTH |
| ID | FETCH-LOGICAL-a278t-c0f2407aa72c1c45f32e9d78ae652425ef9fddc0ab9561209cfdf937c9ec76fc3 |
| IEDL.DBID | RIE |
| ISBN | 1538658151 9781538658154 |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000494275100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:48:40 EDT 2025 Sat Jun 15 16:36:42 EDT 2024 Wed Jan 31 06:41:02 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | hybrid programming load balancing OpenMP Charm adaptive MPI |
| Language | English |
| LinkModel | DirectLink |
| MeetingName | CCGrid '18: 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing |
| MergedId | FETCHMERGED-LOGICAL-a278t-c0f2407aa72c1c45f32e9d78ae652425ef9fddc0ab9561209cfdf937c9ec76fc3 |
| PageCount | 10 |
| ParticipantIDs | acm_books_10_1109_CCGRID_2018_00018_brief acm_books_10_1109_CCGRID_2018_00018 ieee_primary_8411007 |
| PublicationCentury | 2000 |
| PublicationDate | 20180501 2018-May |
| PublicationDateYYYYMMDD | 2018-05-01 |
| PublicationDate_xml | – month: 05 year: 2018 text: 20180501 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Piscataway, NJ, USA |
| PublicationPlace_xml | – name: Piscataway, NJ, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | 2018 18th IEEE ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) |
| PublicationTitleAbbrev | CCGRID |
| PublicationYear | 2018 |
| Publisher | IEEE Press IEEE |
| Publisher_xml | – name: IEEE Press – name: IEEE |
| SSID | ssj0002684059 |
| Score | 1.7389249 |
| Snippet | The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 31 |
| SubjectTerms | Adaptive MPI Charm Hardware Hybrid Programming Load Balancing Load management Load modeling Message systems OpenMP Programming Runtime Task analysis |
| Title | Multi-level load balancing with an integrated runtime approach |
| URI | https://ieeexplore.ieee.org/document/8411007 |
| WOSCitedRecordID | wos000494275100004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5a8eBJpRXri4BeBNem6e4muQharRZKKUWltyWbB3joVvrw95vJrhXBg96yIWxgcpj5Zr6ZD-BCCMVUTk2kUh--xdSJyH8mkUtz1nGpy_Og1vA65KORmE7luAZXm14Ya20gn9lrXIZavpnrNabK2iLGAWe8DnXO07JXa5NPCVNLElkVLjtUtnu9x8ngHtlbSJekQdZD6dkPGZXgRfq7_7t_D5rf7XhkvHE0-1CzRQNuQutsNETSDxnOlSF3yFLU_gTB5CpRBRl8zYIwZIKSEDNLbqsZ4k146T88956iSgwhUoyLVaSpQ_ClFGe6o-PEdZmVhgtl0wRhg3XSGaOpyrFVlVGpnXE-9tDSap463T2ArWJe2EMgJmaGWo_rlDOxyROVyNT7MuHfiPmdTgvOvZ0yjPKXWQAJVGalNTO0ZihYixZc_uFUlnvE71rQQGtm7-UEjawy5NHv28ewgz8oaYUnsLVarO0pbOuP1dtycRae_RMYWKoR |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSsNAFL3UKuhKpRXrc0A3gtFJmsfMRtBqbTGWUqp0FybzABdNpQ-_37nTWBFc6C4ZhhBOAnMf554DcM6YCEROlSdiG76F1DDP3kaeifPAN7HJc-fW8JomvR4bjXi_AperWRittSOf6Su8dL18NZELLJVdsxAFzpI1WEfnrHJaa1VRcbolES9blz7l163W46B7j_wtJExSZ-wh5PiHkYo7R9rb_3uDHah_D-SR_uqo2YWKLmpw44ZnvRRpPySdCEXukKco7Q6C5VUiCtL9UoNQZICmEGNNbksV8Tq8tB-GrY5X2iF4IkjY3JPUYPolRBJIX4aRaQaaq4QJHUeYOGjDjVKSihyHVQPKpVHGRh-Sa5nERjb3oFpMCr0PRIWBotpmdsKoUOWRiHhsAWX2KwV2xW_AmcUpwzh_lrk0gfJsiWaGaLqWNWvAxR92ZbnN-U0Daohm9r7U0MhKIA9-Xz6Fzc7wOc3Sbu_pELbwYUuS4RFU59OFPoYN-TF_m01P3C_wCScfrVo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+18th+IEEE+ACM+International+Symposium+on+Cluster%2C+Cloud+and+Grid+Computing+%28CCGRID%29&rft.atitle=Multi-Level+Load+Balancing+with+an+Integrated+Runtime+Approach&rft.au=Bak%2C+Seonmyeong&rft.au=Menon%2C+Harshitha&rft.au=White%2C+Sam&rft.au=Diener%2C+Matthias&rft.date=2018-05-01&rft.pub=IEEE&rft.spage=31&rft.epage=40&rft_id=info:doi/10.1109%2FCCGRID.2018.00018&rft.externalDocID=8411007 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781538658154/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781538658154/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781538658154/sc.gif&client=summon&freeimage=true |

