An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications
In the distributed computing framework of Spark, cross-node/rack data transfer produced by map tasks and reduce tasks are common problems resulting in performance degradation, such as prolonging of entire execution time and network congestion. To address these problems, this article utilizes the bip...
Uloženo v:
| Vydáno v: | IEEE transactions on parallel and distributed systems Ročník 31; číslo 10; s. 2406 - 2420 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.10.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In the distributed computing framework of Spark, cross-node/rack data transfer produced by map tasks and reduce tasks are common problems resulting in performance degradation, such as prolonging of entire execution time and network congestion. To address these problems, this article utilizes the bipartite graph modelling to propose an optimal locality-aware task scheduling algorithm. By considering global optimality, the algorithm can generate the optimal scheduling solution for both the map tasks and the reduce tasks for data locality. Because of the different communication modes, this article uses a unified graph to model the map task scheduling and the reduce task scheduling respectively. Then, by calculating the communication cost matrix of tasks, we formulate an optimal task scheduling scheme to minimize overall communication cost and transform the problem as the well-known graph problem: minimum weighted bipartite matching (MWBM), which can be resolved by Kuhn-Munkres algorithm. In addition, this article proposes a locality-aware executor allocation strategy to improve the data locality further. We implement our algorithm and strategy in Spark-2.4.1 and evaluate its performance using several representative micro-benchmarks, macro-benchmarks, and HiBench benchmark suite. The experimental results verify that by reducing the network traffic and access latency, the proposed algorithm can improve the job performance substantially compared to some other task scheduling algorithms. |
|---|---|
| AbstractList | In the distributed computing framework of Spark, cross-node/rack data transfer produced by map tasks and reduce tasks are common problems resulting in performance degradation, such as prolonging of entire execution time and network congestion. To address these problems, this article utilizes the bipartite graph modelling to propose an optimal locality-aware task scheduling algorithm. By considering global optimality, the algorithm can generate the optimal scheduling solution for both the map tasks and the reduce tasks for data locality. Because of the different communication modes, this article uses a unified graph to model the map task scheduling and the reduce task scheduling respectively. Then, by calculating the communication cost matrix of tasks, we formulate an optimal task scheduling scheme to minimize overall communication cost and transform the problem as the well-known graph problem: minimum weighted bipartite matching (MWBM), which can be resolved by Kuhn-Munkres algorithm. In addition, this article proposes a locality-aware executor allocation strategy to improve the data locality further. We implement our algorithm and strategy in Spark-2.4.1 and evaluate its performance using several representative micro-benchmarks, macro-benchmarks, and HiBench benchmark suite. The experimental results verify that by reducing the network traffic and access latency, the proposed algorithm can improve the job performance substantially compared to some other task scheduling algorithms. |
| Author | Yang, Li Fu, Zhongming Liu, Chubo Tang, Zhuo |
| Author_xml | – sequence: 1 givenname: Zhongming orcidid: 0000-0003-3041-6990 surname: Fu fullname: Fu, Zhongming email: fuzhongming@hnu.edu.cn organization: College of Information Science and Engineering, and National Supercomputing Center in Changsha, Hunan University, Hunan, China – sequence: 2 givenname: Zhuo orcidid: 0000-0001-9081-8153 surname: Tang fullname: Tang, Zhuo email: ztang@hnu.edu.cn organization: College of Information Science and Engineering, and National Supercomputing Center in Changsha, Hunan University, Hunan, China – sequence: 3 givenname: Li surname: Yang fullname: Yang, Li email: yanglixt@gmail.com organization: College of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China – sequence: 4 givenname: Chubo orcidid: 0000-0002-2372-6715 surname: Liu fullname: Liu, Chubo email: liuchubo@hnu.edu.cn organization: College of Information Science and Engineering, and National Supercomputing Center in Changsha, Hunan University, Hunan, China |
| BookMark | eNp9kE1PAjEQhhujiYD-AOOliefFtttdtscFFU0wmIDnTelOobBs17bE8O9dPuLBg-lhmszzzkyeLrqsbQ0I3VHSp5SIx_nH06zPCCN9JgQjg_gCdWiSZBGjWXzZ_glPIsGouEZd79eEUJ4Q3kHbvMbTJpitrPDEKlmZsI_yb-kAz6Xf4JlaQbmrTL3EebW0zoTVFg-lhxLbGg9NI10wAfDYyWaF320J1RHW1uFZ29zgvGkqo2QwtvY36ErLysPtufbQ58vzfPQaTabjt1E-iRQTcYhKAYqUpVBMcc4VMC6A0QUVikAstNJJEmvNEsoWXAtBeZqpFCDNMsJgoHXcQw-nuY2zXzvwoVjbnavblQXjpH2MpYOWGpwo5az3DnShTDgeGpw0VUFJcXBbHNwWB7fF2W2bpH-SjWsVuv2_mftTxgDALy9IlmSCxD9aloeL |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_3390_electronics9122077 crossref_primary_10_1109_TC_2024_3500381 crossref_primary_10_2298_CSIS240831018Z crossref_primary_10_26599_BDMA_2024_9020084 crossref_primary_10_1007_s11227_022_04381_y crossref_primary_10_1007_s42514_025_00225_2 crossref_primary_10_1016_j_chemolab_2023_104896 crossref_primary_10_1007_s42514_025_00218_1 crossref_primary_10_1002_cpe_70244 crossref_primary_10_3724_SP_J_1249_2025_03317 crossref_primary_10_1109_TVT_2021_3109265 crossref_primary_10_1007_s10723_022_09630_1 crossref_primary_10_1016_j_future_2024_02_009 crossref_primary_10_1016_j_jnca_2020_102889 crossref_primary_10_1109_TCC_2024_3406041 crossref_primary_10_1109_JIOT_2023_3234078 crossref_primary_10_1186_s13677_023_00520_9 crossref_primary_10_1061_JCEMD4_COENG_15107 crossref_primary_10_1155_2022_4148713 crossref_primary_10_1109_TGCN_2023_3347276 crossref_primary_10_3390_app12031646 crossref_primary_10_1016_j_future_2023_01_024 crossref_primary_10_1109_TPDS_2022_3155713 |
| Cites_doi | 10.1109/INFCOM.2013.6566850 10.1109/CLOUD.2011.17 10.1016/j.future.2016.06.027 10.1002/nav.20053 10.1109/INFCOM.2013.6566959 10.1007/978-3-319-11194-0_7 10.1007/s10586-017-0972-7 10.1109/CCGrid.2011.55 10.1109/TCC.2018.2878838 10.1145/1327452.1327492 10.5815/ijitcs.2015.04.08 10.1137/0105003 10.1145/1755913.1755940 10.1016/j.peva.2015.12.002 10.1109/BigData.2017.8257943 10.1109/CCGrid.2012.42 10.1109/INFOCOM.2014.6848123 10.1145/1272998.1273005 10.1109/TCC.2019.2947674 10.1016/j.future.2018.07.043 10.1109/ISPDC.2015.12 10.1109/TPDS.2016.2603511 10.1109/JSYST.2017.2764481 10.1109/CloudCom.2010.25 10.1109/TC.2017.2669964 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPDS.2020.2992073 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 2420 |
| ExternalDocumentID | 10_1109_TPDS_2020_2992073 9085890 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: China Knowledge Centre for Engineering Sciences and Technology grantid: CKCEST-2018-1-13; CKCEST-2019-2-13 funderid: 10.13039/501100012455 – fundername: National Natural Science Foundation of China grantid: 61873090; L1824034; L1924056 funderid: 10.13039/501100001809 – fundername: Science and Technology on Parallel and Distributed Processing Laboratory grantid: WDZC20195500110 – fundername: National Basic Research Program of China (973 Program); National Key Research and Development Program of China grantid: 2018YFB1701401; 2018YFB0203804; 2017YFB0202201 funderid: 10.13039/501100012166 – fundername: Ministry of Education-China Mobile Research Fund Project grantid: MCM20170506 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c293t-d9ec0dd9c2c444ce249e21b19c0e39fcf553ff2512b4f991468c6ee68802e7ff3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 38 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000536784800003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Mon Jun 30 02:36:49 EDT 2025 Tue Nov 18 22:32:09 EST 2025 Sat Nov 29 06:06:47 EST 2025 Wed Aug 27 02:39:30 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-d9ec0dd9c2c444ce249e21b19c0e39fcf553ff2512b4f991468c6ee68802e7ff3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-2372-6715 0000-0001-9081-8153 0000-0003-3041-6990 |
| PQID | 2404042267 |
| PQPubID | 85437 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_9085890 proquest_journals_2404042267 crossref_primary_10_1109_TPDS_2020_2992073 crossref_citationtrail_10_1109_TPDS_2020_2992073 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-10-01 |
| PublicationDateYYYYMMDD | 2020-10-01 |
| PublicationDate_xml | – month: 10 year: 2020 text: 2020-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | lin (ref38) 0 ref13 ref12 ref15 ref14 ref30 tang (ref11) 2015 ref1 ref17 xie (ref18) 2010 hammoud (ref10) 2012 ref16 (ref35) 0 ref19 (ref33) 0 (ref31) 0 (ref3) 0 ren (ref39) 2013 (ref26) 0 ref24 ref23 ref25 ref20 ref22 ref21 (ref36) 0 (ref34) 0 ref28 ref27 (ref32) 0 ref29 ref8 ref7 (ref2) 0 qumranet (ref37) 2007 ref9 ref4 ref6 ref5 ref40 |
| References_xml | – start-page: 225 year: 2007 ident: ref37 article-title: KVM: The linux virtual machine monitor publication-title: Proc Ottawa Linux Symp – ident: ref21 doi: 10.1109/INFCOM.2013.6566850 – ident: ref15 doi: 10.1109/CLOUD.2011.17 – year: 0 ident: ref33 – ident: ref20 doi: 10.1016/j.future.2016.06.027 – year: 0 ident: ref34 – ident: ref29 doi: 10.1002/nav.20053 – ident: ref40 doi: 10.1109/INFCOM.2013.6566959 – year: 0 ident: ref38 article-title: The curse of Zipf and limits to parallelization: A look at the stragglers problem in MapReduce – ident: ref16 doi: 10.1007/978-3-319-11194-0_7 – start-page: 536 year: 2015 ident: ref11 article-title: MARS: Scheduling non-local tasks in MapReduce publication-title: Proc IEEE Int Conf Cloud Comput Intell Syst – year: 0 ident: ref32 – year: 0 ident: ref2 – year: 0 ident: ref31 – ident: ref13 doi: 10.1007/s10586-017-0972-7 – year: 0 ident: ref3 – ident: ref23 doi: 10.1109/CCGrid.2011.55 – start-page: 3 year: 2013 ident: ref39 article-title: Workload characterization on a production Hadoop cluster: A case study on taobao publication-title: Proc IEEE Int Symp Workload Characterization – ident: ref27 doi: 10.1109/TCC.2018.2878838 – start-page: 1 year: 2010 ident: ref18 article-title: Improving MapReduce performance through data placement in heterogeneous hadoop clusters publication-title: Proc IEEE Int Symp Parallel Distrib Process – ident: ref1 doi: 10.1145/1327452.1327492 – ident: ref19 doi: 10.5815/ijitcs.2015.04.08 – ident: ref30 doi: 10.1137/0105003 – ident: ref8 doi: 10.1145/1755913.1755940 – ident: ref17 doi: 10.1016/j.peva.2015.12.002 – start-page: 570 year: 2012 ident: ref10 article-title: Locality-aware reduce task scheduling for MapReduce publication-title: Proc IEEE 3rd Int Conf Cloud Comput Technol Sci – ident: ref24 doi: 10.1109/BigData.2017.8257943 – year: 0 ident: ref36 – year: 0 ident: ref35 – ident: ref14 doi: 10.1109/CCGrid.2012.42 – year: 0 ident: ref26 – ident: ref22 doi: 10.1109/INFOCOM.2014.6848123 – ident: ref4 doi: 10.1145/1272998.1273005 – ident: ref6 doi: 10.1109/TCC.2019.2947674 – ident: ref9 doi: 10.1016/j.future.2018.07.043 – ident: ref7 doi: 10.1109/ISPDC.2015.12 – ident: ref5 doi: 10.1109/TPDS.2016.2603511 – ident: ref12 doi: 10.1109/JSYST.2017.2764481 – ident: ref28 doi: 10.1109/CloudCom.2010.25 – ident: ref25 doi: 10.1109/TC.2017.2669964 |
| SSID | ssj0014504 |
| Score | 2.4747553 |
| Snippet | In the distributed computing framework of Spark, cross-node/rack data transfer produced by map tasks and reduce tasks are common problems resulting in... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2406 |
| SubjectTerms | Algorithms Benchmarks Communication Communication cost Communications traffic Computer networks data locality Data transfer Data transfer (computers) Distributed processing Graph theory Modelling Optimal scheduling Optimization Performance degradation Performance evaluation Scheduling Scheduling algorithms Spark Sparks Task analysis Task scheduling weighted bipartite graph |
| Title | An Optimal Locality-Aware Task Scheduling Algorithm Based on Bipartite Graph Modelling for Spark Applications |
| URI | https://ieeexplore.ieee.org/document/9085890 https://www.proquest.com/docview/2404042267 |
| Volume | 31 |
| WOSCitedRecordID | wos000536784800003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5UPOjB-sT6Yg-exGiabHazx9TnQapgBW-hu5losU1LG_XvO7tNS0URvAWSCYFvZ-abzO58AMc8jzTKMCAXRythlnFPSyE8EUcdFfiKkr52YhOy1Yqfn9XDApzOzsIgott8hmf20vXys4F5t7_KzhXxg1hRgb4opZic1Zp1DHjkpAKpuog8RW5YdTAbvjpvP1w-UiUY-GcUewNfht9ykBNV-RGJXXq5rv3vw9ZhraKRLJngvgELWGxCbSrRwCqP3YTVuXmDW9BPCnZPMaJPpnc2iREF95LPzghZuzN-I7NXyjz2gDpLei-DUbd87bMm5bmMDQrW7A7tOiuR3dgp18zKqLmJ3oyIL3ukm28smeuHb8PT9VX74tar9BY8Q0m_9DKFxs8yZQLDOTdIlRkGDd1QxsdQ5SaPojDPLSHSPCdeyUVsBKKgEBCgzPNwB5aKQYG7wCgKaBVLozQ33JIeFRlhhOY6jrjJeB38KQKpqYaRW02MXuqKEl-lFrTUgpZWoNXhZGYynEzi-OvhLYvS7MEKoDocTGFOK18dp8RpuJ2EJuTe71b7sGLfPdnCdwBL5egdD2HZfJTd8ejILcMvyevZxA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fT9swED4hhsR4GL-1AmN-4Gki4Ca2Ez8GGDBRChJF4i2qncuooGnVptu_z9lNK9AQ0t4iJadE-nx33-Xs-wAORCENxlFILo5OwiwXgYmVClQiuzrkmpK-8WITcbudPDzo2wU4nJ-FQUS_-QyP3KXv5ecDO3G_yo418YNEU4H-SQoR8ulprXnPQEgvFkj1hQw0OWLdw2xyfdy5PbujWjDkRxR9Qx5Hb7KQl1X5Jxb7BHO--n-ftgZfaiLJ0iny67CA5QaszkQaWO2zG7DyauLgJvTTkt1QlOiTaculMSLhQfq3O0LW6Y6fyOyRco87os7S59-DUa967LMTynQ5G5TspDd0K61CduHmXDMnpOZnejOivuyObj6x9FVHfAvuz392Ti-DWnEhsJT2qyDXaHmeaxtaIYRFqs0wbJqmthwjXdhCyqgoHCUyoiBmKVRiFaKiIBBiXBTRNiyWgxK_AqM4YHQSW22EFY72aGmVVUaYRAqbiwbwGQKZrceRO1WM58yXJVxnDrTMgZbVoDXgx9xkOJ3F8dHDmw6l-YM1QA3Ym8Gc1d46zojVCDcLTcU771t9h-XLznUra_1qX-3CZ_ee6Ya-PVisRhP8Bkv2T9Ubj_b9knwBRCTdCw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Optimal+Locality-Aware+Task+Scheduling+Algorithm+Based+on+Bipartite+Graph+Modelling+for+Spark+Applications&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Fu%2C+Zhongming&rft.au=Tang%2C+Zhuo&rft.au=Yang%2C+Li&rft.au=Liu%2C+Chubo&rft.date=2020-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=31&rft.issue=10&rft.spage=2406&rft_id=info:doi/10.1109%2FTPDS.2020.2992073&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |