E-OSched: a load balancing scheduler for heterogeneous multicores
The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines....
Uložené v:
| Vydané v: | The Journal of supercomputing Ročník 74; číslo 10; s. 5399 - 5431 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
Springer US
01.10.2018
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0920-8542, 1573-0484 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput. |
|---|---|
| AbstractList | The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput. |
| Author | Aleem, Muhammad Islam, Muhammad Arshad Prodan, Radu Khalid, Yasir Noman Iqbal, Muhammad Azhar |
| Author_xml | – sequence: 1 givenname: Yasir Noman surname: Khalid fullname: Khalid, Yasir Noman organization: Capital University of Science and Technology – sequence: 2 givenname: Muhammad orcidid: 0000-0001-8342-5757 surname: Aleem fullname: Aleem, Muhammad email: aleem@cust.edu.pk organization: Capital University of Science and Technology – sequence: 3 givenname: Radu surname: Prodan fullname: Prodan, Radu organization: Alpen-Adria-Universität – sequence: 4 givenname: Muhammad Azhar surname: Iqbal fullname: Iqbal, Muhammad Azhar organization: Capital University of Science and Technology – sequence: 5 givenname: Muhammad Arshad surname: Islam fullname: Islam, Muhammad Arshad organization: Capital University of Science and Technology |
| BookMark | eNp9kMtKAzEUhoNUsFUfwN2A62hOkplM3JVSL1DoQl2HJE3aKdOkJjML394pIwiCrg4c_u9cvhmahBgcQjdA7oAQcZ8BKBWYQI0pZyWGMzSFUjBMeM0naEokJbguOb1As5z3hBDOBJui-RKvX-3ObR4KXbRRbwqjWx1sE7ZFPvX71qXCx1TsXOdS3LrgYp-LQ992jY3J5St07nWb3fV3vUTvj8u3xTNerZ9eFvMVtgyqDhtqDGguzcZTLrTREkwlqfakqq2uPVSOsopbKYRhJfVlaWqouedS6uFBxi7R7Tj3mOJH73Kn9rFPYVipKIDgFZe8HFJiTNkUc07OK9t0umti6JJuWgVEnXyp0ZcafKmTLwUDCb_IY2oOOn3-y9CRyUM2bF36uelv6AscD32g |
| CitedBy_id | crossref_primary_10_1007_s00607_021_00958_2 crossref_primary_10_1007_s11227_024_06394_1 crossref_primary_10_1155_2022_9598933 crossref_primary_10_1007_s00500_020_05152_8 crossref_primary_10_1109_TCSS_2024_3423749 crossref_primary_10_1145_3543859 crossref_primary_10_1007_s10586_020_03117_y crossref_primary_10_1007_s11227_020_03289_9 crossref_primary_10_1016_j_suscom_2022_100683 crossref_primary_10_1109_TFUZZ_2022_3167158 crossref_primary_10_1007_s10723_021_09567_x crossref_primary_10_1002_cpe_7108 crossref_primary_10_1007_s11227_022_04323_8 crossref_primary_10_7717_peerj_cs_1077 crossref_primary_10_1007_s10586_023_04215_3 crossref_primary_10_1007_s11227_023_05266_4 crossref_primary_10_1002_cpe_5606 |
| Cites_doi | 10.1145/1810479.1810498 10.1007/s11227-013-0870-6 10.1007/s11227-017-2177-5 10.3850/9783981537079_0987 10.1007/978-3-642-19861-8_16 10.1109/HOTCHIPS.2009.7478342 10.1145/2628071.2628088 10.1145/2544137.2544163 10.1109/HPCC.2011.20 10.1145/2482767.2482794 10.1145/2798725 10.1145/2442992.2443004 10.1109/HiPC.2011.6152724 10.1002/cpe.1631 10.1007/978-3-540-92990-1_4 10.1145/2400682.2400716 10.1007/978-3-642-23400-2_17 10.1109/ICPPW.2012.14 10.1145/2464996.2465007 10.1109/HPCC.2014.14 10.1145/2159430.2159440 10.1007/s11227-014-1112-2 10.1145/2856636.2856639 10.1109/PACT.2015.14 10.1109/IISWC.2009.5306797 10.1109/InPar.2012.6339595 10.1145/3038228.3038235 10.1109/HiPC.2014.7116910 10.1109/JPROC.2008.917757 |
| ContentType | Journal Article |
| Copyright | Springer Science+Business Media, LLC, part of Springer Nature 2018 Copyright Springer Nature B.V. 2018 |
| Copyright_xml | – notice: Springer Science+Business Media, LLC, part of Springer Nature 2018 – notice: Copyright Springer Nature B.V. 2018 |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s11227-018-2435-1 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 5431 |
| ExternalDocumentID | 10_1007_s11227_018_2435_1 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS JQ2 |
| ID | FETCH-LOGICAL-c316t-b2bb1a49bdf247aba91b692af068ca8f16e2364c977b352f55b8184f499a00733 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 19 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000446893600027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0920-8542 |
| IngestDate | Thu Sep 25 00:41:18 EDT 2025 Sat Nov 29 04:27:36 EST 2025 Tue Nov 18 21:34:56 EST 2025 Fri Feb 21 02:27:37 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Keywords | Scheduling Load balancing Heterogeneous multicores Data-parallel applications |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c316t-b2bb1a49bdf247aba91b692af068ca8f16e2364c977b352f55b8184f499a00733 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-8342-5757 |
| PQID | 2117464945 |
| PQPubID | 2043774 |
| PageCount | 33 |
| ParticipantIDs | proquest_journals_2117464945 crossref_citationtrail_10_1007_s11227_018_2435_1 crossref_primary_10_1007_s11227_018_2435_1 springer_journals_10_1007_s11227_018_2435_1 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-10-01 |
| PublicationDateYYYYMMDD | 2018-10-01 |
| PublicationDate_xml | – month: 10 year: 2018 text: 2018-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2018 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21 Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366 Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10) Samsung Galaxy S8+—Full phone specifications [WWW Document], n.d. http://www.gsmarena.com/samsung_galaxy_s8+-8523.php. Accessed 7 Oct 2017 Wen Y, O’Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs. ACM, pp 22–31. https://doi.org/10.1145/3038228.3038235 Insieme Compiler Project [WWW Document], n.d. http://www.insieme-compiler.org/. Accessed 9 July 2017 Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007 ChoiHJSonDOKangSGKimJMLeeH-HKimCHAn efficient scheduling scheme using estimated execution time for heterogeneous computing systemsJ. Supercomput20136588690210.1007/s11227-013-0870-6 OwensJDHoustonMLuebkeDGreenSStoneJEPhillipsJCGPU computingProc IEEE20089687989910.1109/JPROC.2008.917757 YanXShiXWangLYangHAn OpenCL micro-benchmark suite for GPUs and CPUsJ Supercomput20146969371310.1007/s11227-014-1112-2 AugonnetCThibaultSNamystRWacrenierP-AWacrenier StarPUP-AStarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp20112318719810.1002/cpe.1631 Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54 BelviranliMEBhuyanLNGuptaRA dynamic self-scheduling scheme for heterogeneous multiprocessor architecturesACM Trans Archit Code Optim2013912010.1145/2400682.2400716 Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256 Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305 Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004 Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: Proceedings of 22nd ACM Symposium Parallelism algorithms Architecture, pp 82–91. https://doi.org/10.1145/1810479.1810498 DolbeauRTheoretical peak FLOPS per instruction set: a tutorialJ Supercomput2018741341137710.1007/s11227-017-2177-5 Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: Proceedings of International Conference on Parallel Processing Work, pp 81–88. https://doi.org/10.1109/ICPPW.2012.14 Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917 Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209 Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179 Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism OpenCL—The open standard for parallel programming of heterogeneous systems [WWW Document], n.d. https://www.khronos.org/opencl/. Accessed 1 Mar 17 APP SDK [WWW Document], n.d. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. Accessed 1 May 2017 Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162 JiménezVíctor J.VilanovaLluísGeladoIsaacGilMarisaFursinGrigoriNavarroNachoPredictive Runtime Code Scheduling for Heterogeneous ArchitecturesHigh Performance Embedded Architectures and Compilers2009Berlin, HeidelbergSpringer Berlin Heidelberg193310.1007/978-3-540-92990-1_4 LeeJSamadiMParkYMahlkeSSkmd: single kernel on multiple devices for transparent cpu-gpu collaborationACM Trans Comput Syst20153312710.1145/2798725 Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10 Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10 Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: 18th International Conference on High Performance Computing, HiPC 2011. IEEE, pp 1–10. https://doi.org/10.1109/HiPC.2011.6152724 IMPACT Research Group and others (2007) IMPACT: parboil benchmarks [WWW Document]. http://impact.crhc.illinois.edu/parboil/parboil.aspx. Accessed 1 May 2017 Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93 Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163 Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85 Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55 Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14 Munshi A (2009) The OpenCL specification. In: 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, pp 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342 Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: Proceedings of the 9th India Software Engineering Conference. ACM, pp 131–135. https://doi.org/10.1145/2856636.2856639 2435_CR34 2435_CR33 2435_CR10 2435_CR32 2435_CR31 2435_CR16 2435_CR38 2435_CR15 2435_CR37 2435_CR14 2435_CR36 2435_CR13 2435_CR35 2435_CR7 2435_CR19 C Augonnet (2435_CR4) 2011; 23 2435_CR9 2435_CR18 2435_CR8 2435_CR17 2435_CR39 2435_CR3 2435_CR2 2435_CR5 Víctor J. Jiménez (2435_CR20) 2009 2435_CR1 J Lee (2435_CR24) 2015; 33 JD Owens (2435_CR30) 2008; 96 2435_CR23 2435_CR22 2435_CR21 2435_CR27 2435_CR26 2435_CR25 R Dolbeau (2435_CR12) 2018; 74 2435_CR29 2435_CR28 HJ Choi (2435_CR11) 2013; 65 X Yan (2435_CR40) 2014; 69 ME Belviranli (2435_CR6) 2013; 9 |
| References_xml | – reference: Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163 – reference: Munshi A (2009) The OpenCL specification. In: 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, pp 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342 – reference: ChoiHJSonDOKangSGKimJMLeeH-HKimCHAn efficient scheduling scheme using estimated execution time for heterogeneous computing systemsJ. Supercomput20136588690210.1007/s11227-013-0870-6 – reference: BelviranliMEBhuyanLNGuptaRA dynamic self-scheduling scheme for heterogeneous multiprocessor architecturesACM Trans Archit Code Optim2013912010.1145/2400682.2400716 – reference: Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism – reference: Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10 – reference: Insieme Compiler Project [WWW Document], n.d. http://www.insieme-compiler.org/. Accessed 9 July 2017 – reference: Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA – reference: Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179 – reference: OwensJDHoustonMLuebkeDGreenSStoneJEPhillipsJCGPU computingProc IEEE20089687989910.1109/JPROC.2008.917757 – reference: Samsung Galaxy S8+—Full phone specifications [WWW Document], n.d. http://www.gsmarena.com/samsung_galaxy_s8+-8523.php. Accessed 7 Oct 2017 – reference: AugonnetCThibaultSNamystRWacrenierP-AWacrenier StarPUP-AStarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp20112318719810.1002/cpe.1631 – reference: Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93 – reference: OpenCL—The open standard for parallel programming of heterogeneous systems [WWW Document], n.d. https://www.khronos.org/opencl/. Accessed 1 Mar 17 – reference: Wen Y, O’Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs. ACM, pp 22–31. https://doi.org/10.1145/3038228.3038235 – reference: DolbeauRTheoretical peak FLOPS per instruction set: a tutorialJ Supercomput2018741341137710.1007/s11227-017-2177-5 – reference: Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: Proceedings of 22nd ACM Symposium Parallelism algorithms Architecture, pp 82–91. https://doi.org/10.1145/1810479.1810498 – reference: YanXShiXWangLYangHAn OpenCL micro-benchmark suite for GPUs and CPUsJ Supercomput20146969371310.1007/s11227-014-1112-2 – reference: Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256 – reference: APP SDK [WWW Document], n.d. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. Accessed 1 May 2017 – reference: Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162 – reference: Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366 – reference: Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007 – reference: Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: Proceedings of the 9th India Software Engineering Conference. ACM, pp 131–135. https://doi.org/10.1145/2856636.2856639 – reference: Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209 – reference: Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10) – reference: Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10 – reference: LeeJSamadiMParkYMahlkeSSkmd: single kernel on multiple devices for transparent cpu-gpu collaborationACM Trans Comput Syst20153312710.1145/2798725 – reference: Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14 – reference: Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917 – reference: Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: Proceedings of International Conference on Parallel Processing Work, pp 81–88. https://doi.org/10.1109/ICPPW.2012.14 – reference: Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21 – reference: Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305 – reference: Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55 – reference: IMPACT Research Group and others (2007) IMPACT: parboil benchmarks [WWW Document]. http://impact.crhc.illinois.edu/parboil/parboil.aspx. Accessed 1 May 2017 – reference: Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004 – reference: Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85 – reference: Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: 18th International Conference on High Performance Computing, HiPC 2011. IEEE, pp 1–10. https://doi.org/10.1109/HiPC.2011.6152724 – reference: Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54 – reference: JiménezVíctor J.VilanovaLluísGeladoIsaacGilMarisaFursinGrigoriNavarroNachoPredictive Runtime Code Scheduling for Heterogeneous ArchitecturesHigh Performance Embedded Architectures and Compilers2009Berlin, HeidelbergSpringer Berlin Heidelberg193310.1007/978-3-540-92990-1_4 – ident: 2435_CR5 doi: 10.1145/1810479.1810498 – volume: 65 start-page: 886 year: 2013 ident: 2435_CR11 publication-title: J. Supercomput doi: 10.1007/s11227-013-0870-6 – volume: 74 start-page: 1341 year: 2018 ident: 2435_CR12 publication-title: J Supercomput doi: 10.1007/s11227-017-2177-5 – ident: 2435_CR3 – ident: 2435_CR16 – ident: 2435_CR26 doi: 10.3850/9783981537079_0987 – ident: 2435_CR17 doi: 10.1007/978-3-642-19861-8_16 – ident: 2435_CR18 – ident: 2435_CR28 doi: 10.1109/HOTCHIPS.2009.7478342 – ident: 2435_CR21 doi: 10.1145/2628071.2628088 – ident: 2435_CR31 doi: 10.1145/2544137.2544163 – ident: 2435_CR7 doi: 10.1109/HPCC.2011.20 – ident: 2435_CR8 doi: 10.1145/2482767.2482794 – volume: 33 start-page: 1 year: 2015 ident: 2435_CR24 publication-title: ACM Trans Comput Syst doi: 10.1145/2798725 – ident: 2435_CR37 doi: 10.1145/2442992.2443004 – ident: 2435_CR34 – ident: 2435_CR19 – ident: 2435_CR32 doi: 10.1109/HiPC.2011.6152724 – volume: 23 start-page: 187 year: 2011 ident: 2435_CR4 publication-title: Concurr Comput Pract Exp doi: 10.1002/cpe.1631 – start-page: 19 volume-title: High Performance Embedded Architectures and Compilers year: 2009 ident: 2435_CR20 doi: 10.1007/978-3-540-92990-1_4 – ident: 2435_CR15 – volume: 9 start-page: 1 year: 2013 ident: 2435_CR6 publication-title: ACM Trans Archit Code Optim doi: 10.1145/2400682.2400716 – ident: 2435_CR25 – ident: 2435_CR2 doi: 10.1007/978-3-642-23400-2_17 – ident: 2435_CR1 doi: 10.1109/ICPPW.2012.14 – ident: 2435_CR22 doi: 10.1145/2464996.2465007 – ident: 2435_CR33 doi: 10.1109/HPCC.2014.14 – ident: 2435_CR36 doi: 10.1145/2159430.2159440 – volume: 69 start-page: 693 year: 2014 ident: 2435_CR40 publication-title: J Supercomput doi: 10.1007/s11227-014-1112-2 – ident: 2435_CR10 – ident: 2435_CR13 doi: 10.1145/2856636.2856639 – ident: 2435_CR29 – ident: 2435_CR27 – ident: 2435_CR35 – ident: 2435_CR23 doi: 10.1109/PACT.2015.14 – ident: 2435_CR9 doi: 10.1109/IISWC.2009.5306797 – ident: 2435_CR14 doi: 10.1109/InPar.2012.6339595 – ident: 2435_CR38 doi: 10.1145/3038228.3038235 – ident: 2435_CR39 doi: 10.1109/HiPC.2014.7116910 – volume: 96 start-page: 879 year: 2008 ident: 2435_CR30 publication-title: Proc IEEE doi: 10.1109/JPROC.2008.917757 |
| SSID | ssj0004373 |
| Score | 2.2581556 |
| Snippet | The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications.... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 5399 |
| SubjectTerms | Compilers Computation Computer Science Computer simulation Industry standards Interpreters Load balancing Processor Architectures Programming Languages Resource scheduling Scheduling Servers State of the art |
| Title | E-OSched: a load balancing scheduler for heterogeneous multicores |
| URI | https://link.springer.com/article/10.1007/s11227-018-2435-1 https://www.proquest.com/docview/2117464945 |
| Volume | 74 |
| WOSCitedRecordID | wos000446893600027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: Springer Nature - Connect here FIRST to enable access customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fS8MwEA4yffDF-ROnU_LgkxJY0zRLfBuy4dMUprK3kKQtCmOVdfPv9y5rLYoK-tqmSbm75L7jLt8RcuGlRUoWziRYBxNJ4piGI5AJbp2NhbKKu9Bsoj8eq-lU31f3uMu62r1OSYaTurnsFnGOZZKgWfDxDEKezQTJZjBEnzw1lyHjdVpZQ1ykEsHrVOZ3U3x2Rg3C_JIUDb5m1P7XX-6SnQpa0sHaFvbIRjbfJ-26bQOtdvEBGQzZ3QR0lV5TS2eFTanD-kYPy9ASn69mMBzALH3GWpkCTCwrViUNtYfIelkeksfR8OHmllWdFJiPI7lkjjsXWaFdmnPRBy3oyEnNbd6TyluVRzJDInkPYNABIstBZeDIRQ7hkA1tHY9Ia17Ms2NCRYxNFADnpQDFklQrJayVnHuACogPO6RXi9T4imYcu13MTEOQjCIyICKDIjJRh1x-fPK65tj4bXC31pOptltpIIrtCyk0Ln9V66V5_eNkJ38afUq2OSo2lPJ1SWu5WGVnZMu_LV_KxXmwwndCotKW |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA8yBX1xfuL8zINPSmBN0yzxbcjGxDmFTdlbSdIWhbHKuvn3e9e1FkUFfW3TpNzvkvsdd7kj5NxJgyVZOJOgHUwEgWUajkAmuLHGF8oobvNmE63BQI3H-qG4x52V2e5lSDI_qavLbh7nmCYJyIKNZ-DyrArssoMu-vCpugzpL8PKGvwiFQhehjK_m-KzMaoY5pegaG5ruvV__eUW2SyoJW0vdWGbrMTTHVIv2zbQYhfvknaH3Q8Bq-iKGjpJTUQt5jc6WIZm-HwxgeFAZukz5sqkoGJxushonnuIVS-zPfLY7Yyue6zopMCc78k5s9xazwhto4SLFqCgPSs1N0lTKmdU4skYC8k7IIMWGFkCkIEhFwm4QyZv67hPatN0Gh8QKnxsogA8LwIqFkRaKWGM5NwBVUB-2CDNUqShK8qMY7eLSVgVSEYRhSCiEEUUeg1y8fHJ67LGxm-Dj0ucwmK7ZSF4sS0hhcblL0tcqtc_Tnb4p9FnZL03uuuH_ZvB7RHZ4AhyntZ3TGrz2SI-IWvubf6SzU5zjXwHKvXVeg |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA-iIr44P3E6NQ8-KWFrmmaJb0M3FGUOp7K3kqQtCqMda-ff76UfFkUF8bXNR7m79H7HXX6H0KnhylKyUMLBOgjzPE0k_AIJo0orlwklqM6bTXSHQzGZyFHZ5zStqt2rlGRxp8GyNMVZexZE7frim0OpLZkELYO_JxD-rDAIZGxN18P4ub4Y6RYpZgkxkvAYrdKa3y3x2THVaPNLgjT3O4PGv794E22UkBP3ChvZQkthvI0aVTsHXJ7uHdTrk_sx6DC4wApPExVgbeseDWyJU_t8MYXhAHLxi62hScD0wmSR4rwm0bJhprvoadB_vLwmZYcFYlyHZ0RTrR3FpA4iyrqgHeloLqmKOlwYJSKHh5Zg3gBI1IDUIlAlOHgWQZik8naPe2g5TuJwH2Hm2uYKgP8CgGheIIVgSnFKDUAIixubqFOJ1zcl_bjtgjH1a-JkKyIfRORbEflOE519TJkV3Bu_DW5VOvPLY5j6EN12GWfSbn9e6ah-_eNiB38afYLWRlcD_-5meHuI1qnVcV7t10LL2XwRHqFV85a9pvPj3DjfAcW43l4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=E-OSched%3A+a+load+balancing+scheduler+for+heterogeneous+multicores&rft.jtitle=The+Journal+of+supercomputing&rft.au=Khalid%2C+Yasir+Noman&rft.au=Aleem%2C+Muhammad&rft.au=Prodan%2C+Radu&rft.au=Iqbal%2C+Muhammad+Azhar&rft.date=2018-10-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=74&rft.issue=10&rft.spage=5399&rft.epage=5431&rft_id=info:doi/10.1007%2Fs11227-018-2435-1&rft.externalDocID=10_1007_s11227_018_2435_1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |