E-OSched: a load balancing scheduler for heterogeneous multicores

The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 74; číslo 10; s. 5399 - 5431
Hlavní autori: Khalid, Yasir Noman, Aleem, Muhammad, Prodan, Radu, Iqbal, Muhammad Azhar, Islam, Muhammad Arshad
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.10.2018
Springer Nature B.V
Predmet:
ISSN:0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput.
AbstractList The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput.
Author Aleem, Muhammad
Islam, Muhammad Arshad
Prodan, Radu
Khalid, Yasir Noman
Iqbal, Muhammad Azhar
Author_xml – sequence: 1
  givenname: Yasir Noman
  surname: Khalid
  fullname: Khalid, Yasir Noman
  organization: Capital University of Science and Technology
– sequence: 2
  givenname: Muhammad
  orcidid: 0000-0001-8342-5757
  surname: Aleem
  fullname: Aleem, Muhammad
  email: aleem@cust.edu.pk
  organization: Capital University of Science and Technology
– sequence: 3
  givenname: Radu
  surname: Prodan
  fullname: Prodan, Radu
  organization: Alpen-Adria-Universität
– sequence: 4
  givenname: Muhammad Azhar
  surname: Iqbal
  fullname: Iqbal, Muhammad Azhar
  organization: Capital University of Science and Technology
– sequence: 5
  givenname: Muhammad Arshad
  surname: Islam
  fullname: Islam, Muhammad Arshad
  organization: Capital University of Science and Technology
BookMark eNp9kMtKAzEUhoNUsFUfwN2A62hOkplM3JVSL1DoQl2HJE3aKdOkJjML394pIwiCrg4c_u9cvhmahBgcQjdA7oAQcZ8BKBWYQI0pZyWGMzSFUjBMeM0naEokJbguOb1As5z3hBDOBJui-RKvX-3ObR4KXbRRbwqjWx1sE7ZFPvX71qXCx1TsXOdS3LrgYp-LQ992jY3J5St07nWb3fV3vUTvj8u3xTNerZ9eFvMVtgyqDhtqDGguzcZTLrTREkwlqfakqq2uPVSOsopbKYRhJfVlaWqouedS6uFBxi7R7Tj3mOJH73Kn9rFPYVipKIDgFZe8HFJiTNkUc07OK9t0umti6JJuWgVEnXyp0ZcafKmTLwUDCb_IY2oOOn3-y9CRyUM2bF36uelv6AscD32g
CitedBy_id crossref_primary_10_1007_s00607_021_00958_2
crossref_primary_10_1007_s11227_024_06394_1
crossref_primary_10_1155_2022_9598933
crossref_primary_10_1007_s00500_020_05152_8
crossref_primary_10_1109_TCSS_2024_3423749
crossref_primary_10_1145_3543859
crossref_primary_10_1007_s10586_020_03117_y
crossref_primary_10_1007_s11227_020_03289_9
crossref_primary_10_1016_j_suscom_2022_100683
crossref_primary_10_1109_TFUZZ_2022_3167158
crossref_primary_10_1007_s10723_021_09567_x
crossref_primary_10_1002_cpe_7108
crossref_primary_10_1007_s11227_022_04323_8
crossref_primary_10_7717_peerj_cs_1077
crossref_primary_10_1007_s10586_023_04215_3
crossref_primary_10_1007_s11227_023_05266_4
crossref_primary_10_1002_cpe_5606
Cites_doi 10.1145/1810479.1810498
10.1007/s11227-013-0870-6
10.1007/s11227-017-2177-5
10.3850/9783981537079_0987
10.1007/978-3-642-19861-8_16
10.1109/HOTCHIPS.2009.7478342
10.1145/2628071.2628088
10.1145/2544137.2544163
10.1109/HPCC.2011.20
10.1145/2482767.2482794
10.1145/2798725
10.1145/2442992.2443004
10.1109/HiPC.2011.6152724
10.1002/cpe.1631
10.1007/978-3-540-92990-1_4
10.1145/2400682.2400716
10.1007/978-3-642-23400-2_17
10.1109/ICPPW.2012.14
10.1145/2464996.2465007
10.1109/HPCC.2014.14
10.1145/2159430.2159440
10.1007/s11227-014-1112-2
10.1145/2856636.2856639
10.1109/PACT.2015.14
10.1109/IISWC.2009.5306797
10.1109/InPar.2012.6339595
10.1145/3038228.3038235
10.1109/HiPC.2014.7116910
10.1109/JPROC.2008.917757
ContentType Journal Article
Copyright Springer Science+Business Media, LLC, part of Springer Nature 2018
Copyright Springer Nature B.V. 2018
Copyright_xml – notice: Springer Science+Business Media, LLC, part of Springer Nature 2018
– notice: Copyright Springer Nature B.V. 2018
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11227-018-2435-1
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 5431
ExternalDocumentID 10_1007_s11227_018_2435_1
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
JQ2
ID FETCH-LOGICAL-c316t-b2bb1a49bdf247aba91b692af068ca8f16e2364c977b352f55b8184f499a00733
IEDL.DBID RSV
ISICitedReferencesCount 19
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000446893600027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Thu Sep 25 00:41:18 EDT 2025
Sat Nov 29 04:27:36 EST 2025
Tue Nov 18 21:34:56 EST 2025
Fri Feb 21 02:27:37 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords Scheduling
Load balancing
Heterogeneous multicores
Data-parallel applications
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-b2bb1a49bdf247aba91b692af068ca8f16e2364c977b352f55b8184f499a00733
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-8342-5757
PQID 2117464945
PQPubID 2043774
PageCount 33
ParticipantIDs proquest_journals_2117464945
crossref_citationtrail_10_1007_s11227_018_2435_1
crossref_primary_10_1007_s11227_018_2435_1
springer_journals_10_1007_s11227_018_2435_1
PublicationCentury 2000
PublicationDate 2018-10-01
PublicationDateYYYYMMDD 2018-10-01
PublicationDate_xml – month: 10
  year: 2018
  text: 2018-10-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2018
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21
Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366
Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10)
Samsung Galaxy S8+—Full phone specifications [WWW Document], n.d. http://www.gsmarena.com/samsung_galaxy_s8+-8523.php. Accessed 7 Oct 2017
Wen Y, O’Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs. ACM, pp 22–31. https://doi.org/10.1145/3038228.3038235
Insieme Compiler Project [WWW Document], n.d. http://www.insieme-compiler.org/. Accessed 9 July 2017
Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007
ChoiHJSonDOKangSGKimJMLeeH-HKimCHAn efficient scheduling scheme using estimated execution time for heterogeneous computing systemsJ. Supercomput20136588690210.1007/s11227-013-0870-6
OwensJDHoustonMLuebkeDGreenSStoneJEPhillipsJCGPU computingProc IEEE20089687989910.1109/JPROC.2008.917757
YanXShiXWangLYangHAn OpenCL micro-benchmark suite for GPUs and CPUsJ Supercomput20146969371310.1007/s11227-014-1112-2
AugonnetCThibaultSNamystRWacrenierP-AWacrenier StarPUP-AStarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp20112318719810.1002/cpe.1631
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54
BelviranliMEBhuyanLNGuptaRA dynamic self-scheduling scheme for heterogeneous multiprocessor architecturesACM Trans Archit Code Optim2013912010.1145/2400682.2400716
Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256
Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305
Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004
Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: Proceedings of 22nd ACM Symposium Parallelism algorithms Architecture, pp 82–91. https://doi.org/10.1145/1810479.1810498
DolbeauRTheoretical peak FLOPS per instruction set: a tutorialJ Supercomput2018741341137710.1007/s11227-017-2177-5
Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: Proceedings of International Conference on Parallel Processing Work, pp 81–88. https://doi.org/10.1109/ICPPW.2012.14
Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917
Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209
Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179
Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism
OpenCL—The open standard for parallel programming of heterogeneous systems [WWW Document], n.d. https://www.khronos.org/opencl/. Accessed 1 Mar 17
APP SDK [WWW Document], n.d. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. Accessed 1 May 2017
Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162
JiménezVíctor J.VilanovaLluísGeladoIsaacGilMarisaFursinGrigoriNavarroNachoPredictive Runtime Code Scheduling for Heterogeneous ArchitecturesHigh Performance Embedded Architectures and Compilers2009Berlin, HeidelbergSpringer Berlin Heidelberg193310.1007/978-3-540-92990-1_4
LeeJSamadiMParkYMahlkeSSkmd: single kernel on multiple devices for transparent cpu-gpu collaborationACM Trans Comput Syst20153312710.1145/2798725
Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10
Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10
Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA
Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: 18th International Conference on High Performance Computing, HiPC 2011. IEEE, pp 1–10. https://doi.org/10.1109/HiPC.2011.6152724
IMPACT Research Group and others (2007) IMPACT: parboil benchmarks [WWW Document]. http://impact.crhc.illinois.edu/parboil/parboil.aspx. Accessed 1 May 2017
Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93
Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163
Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85
Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55
Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14
Munshi A (2009) The OpenCL specification. In: 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, pp 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342
Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: Proceedings of the 9th India Software Engineering Conference. ACM, pp 131–135. https://doi.org/10.1145/2856636.2856639
2435_CR34
2435_CR33
2435_CR10
2435_CR32
2435_CR31
2435_CR16
2435_CR38
2435_CR15
2435_CR37
2435_CR14
2435_CR36
2435_CR13
2435_CR35
2435_CR7
2435_CR19
C Augonnet (2435_CR4) 2011; 23
2435_CR9
2435_CR18
2435_CR8
2435_CR17
2435_CR39
2435_CR3
2435_CR2
2435_CR5
Víctor J. Jiménez (2435_CR20) 2009
2435_CR1
J Lee (2435_CR24) 2015; 33
JD Owens (2435_CR30) 2008; 96
2435_CR23
2435_CR22
2435_CR21
2435_CR27
2435_CR26
2435_CR25
R Dolbeau (2435_CR12) 2018; 74
2435_CR29
2435_CR28
HJ Choi (2435_CR11) 2013; 65
X Yan (2435_CR40) 2014; 69
ME Belviranli (2435_CR6) 2013; 9
References_xml – reference: Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163
– reference: Munshi A (2009) The OpenCL specification. In: 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, pp 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342
– reference: ChoiHJSonDOKangSGKimJMLeeH-HKimCHAn efficient scheduling scheme using estimated execution time for heterogeneous computing systemsJ. Supercomput20136588690210.1007/s11227-013-0870-6
– reference: BelviranliMEBhuyanLNGuptaRA dynamic self-scheduling scheme for heterogeneous multiprocessor architecturesACM Trans Archit Code Optim2013912010.1145/2400682.2400716
– reference: Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism
– reference: Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10
– reference: Insieme Compiler Project [WWW Document], n.d. http://www.insieme-compiler.org/. Accessed 9 July 2017
– reference: Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA
– reference: Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179
– reference: OwensJDHoustonMLuebkeDGreenSStoneJEPhillipsJCGPU computingProc IEEE20089687989910.1109/JPROC.2008.917757
– reference: Samsung Galaxy S8+—Full phone specifications [WWW Document], n.d. http://www.gsmarena.com/samsung_galaxy_s8+-8523.php. Accessed 7 Oct 2017
– reference: AugonnetCThibaultSNamystRWacrenierP-AWacrenier StarPUP-AStarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architecturesConcurr Comput Pract Exp20112318719810.1002/cpe.1631
– reference: Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93
– reference: OpenCL—The open standard for parallel programming of heterogeneous systems [WWW Document], n.d. https://www.khronos.org/opencl/. Accessed 1 Mar 17
– reference: Wen Y, O’Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs. ACM, pp 22–31. https://doi.org/10.1145/3038228.3038235
– reference: DolbeauRTheoretical peak FLOPS per instruction set: a tutorialJ Supercomput2018741341137710.1007/s11227-017-2177-5
– reference: Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: Proceedings of 22nd ACM Symposium Parallelism algorithms Architecture, pp 82–91. https://doi.org/10.1145/1810479.1810498
– reference: YanXShiXWangLYangHAn OpenCL micro-benchmark suite for GPUs and CPUsJ Supercomput20146969371310.1007/s11227-014-1112-2
– reference: Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256
– reference: APP SDK [WWW Document], n.d. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. Accessed 1 May 2017
– reference: Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162
– reference: Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366
– reference: Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007
– reference: Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: Proceedings of the 9th India Software Engineering Conference. ACM, pp 131–135. https://doi.org/10.1145/2856636.2856639
– reference: Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209
– reference: Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10)
– reference: Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10
– reference: LeeJSamadiMParkYMahlkeSSkmd: single kernel on multiple devices for transparent cpu-gpu collaborationACM Trans Comput Syst20153312710.1145/2798725
– reference: Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14
– reference: Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917
– reference: Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: Proceedings of International Conference on Parallel Processing Work, pp 81–88. https://doi.org/10.1109/ICPPW.2012.14
– reference: Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21
– reference: Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305
– reference: Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55
– reference: IMPACT Research Group and others (2007) IMPACT: parboil benchmarks [WWW Document]. http://impact.crhc.illinois.edu/parboil/parboil.aspx. Accessed 1 May 2017
– reference: Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004
– reference: Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85
– reference: Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: 18th International Conference on High Performance Computing, HiPC 2011. IEEE, pp 1–10. https://doi.org/10.1109/HiPC.2011.6152724
– reference: Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54
– reference: JiménezVíctor J.VilanovaLluísGeladoIsaacGilMarisaFursinGrigoriNavarroNachoPredictive Runtime Code Scheduling for Heterogeneous ArchitecturesHigh Performance Embedded Architectures and Compilers2009Berlin, HeidelbergSpringer Berlin Heidelberg193310.1007/978-3-540-92990-1_4
– ident: 2435_CR5
  doi: 10.1145/1810479.1810498
– volume: 65
  start-page: 886
  year: 2013
  ident: 2435_CR11
  publication-title: J. Supercomput
  doi: 10.1007/s11227-013-0870-6
– volume: 74
  start-page: 1341
  year: 2018
  ident: 2435_CR12
  publication-title: J Supercomput
  doi: 10.1007/s11227-017-2177-5
– ident: 2435_CR3
– ident: 2435_CR16
– ident: 2435_CR26
  doi: 10.3850/9783981537079_0987
– ident: 2435_CR17
  doi: 10.1007/978-3-642-19861-8_16
– ident: 2435_CR18
– ident: 2435_CR28
  doi: 10.1109/HOTCHIPS.2009.7478342
– ident: 2435_CR21
  doi: 10.1145/2628071.2628088
– ident: 2435_CR31
  doi: 10.1145/2544137.2544163
– ident: 2435_CR7
  doi: 10.1109/HPCC.2011.20
– ident: 2435_CR8
  doi: 10.1145/2482767.2482794
– volume: 33
  start-page: 1
  year: 2015
  ident: 2435_CR24
  publication-title: ACM Trans Comput Syst
  doi: 10.1145/2798725
– ident: 2435_CR37
  doi: 10.1145/2442992.2443004
– ident: 2435_CR34
– ident: 2435_CR19
– ident: 2435_CR32
  doi: 10.1109/HiPC.2011.6152724
– volume: 23
  start-page: 187
  year: 2011
  ident: 2435_CR4
  publication-title: Concurr Comput Pract Exp
  doi: 10.1002/cpe.1631
– start-page: 19
  volume-title: High Performance Embedded Architectures and Compilers
  year: 2009
  ident: 2435_CR20
  doi: 10.1007/978-3-540-92990-1_4
– ident: 2435_CR15
– volume: 9
  start-page: 1
  year: 2013
  ident: 2435_CR6
  publication-title: ACM Trans Archit Code Optim
  doi: 10.1145/2400682.2400716
– ident: 2435_CR25
– ident: 2435_CR2
  doi: 10.1007/978-3-642-23400-2_17
– ident: 2435_CR1
  doi: 10.1109/ICPPW.2012.14
– ident: 2435_CR22
  doi: 10.1145/2464996.2465007
– ident: 2435_CR33
  doi: 10.1109/HPCC.2014.14
– ident: 2435_CR36
  doi: 10.1145/2159430.2159440
– volume: 69
  start-page: 693
  year: 2014
  ident: 2435_CR40
  publication-title: J Supercomput
  doi: 10.1007/s11227-014-1112-2
– ident: 2435_CR10
– ident: 2435_CR13
  doi: 10.1145/2856636.2856639
– ident: 2435_CR29
– ident: 2435_CR27
– ident: 2435_CR35
– ident: 2435_CR23
  doi: 10.1109/PACT.2015.14
– ident: 2435_CR9
  doi: 10.1109/IISWC.2009.5306797
– ident: 2435_CR14
  doi: 10.1109/InPar.2012.6339595
– ident: 2435_CR38
  doi: 10.1145/3038228.3038235
– ident: 2435_CR39
  doi: 10.1109/HiPC.2014.7116910
– volume: 96
  start-page: 879
  year: 2008
  ident: 2435_CR30
  publication-title: Proc IEEE
  doi: 10.1109/JPROC.2008.917757
SSID ssj0004373
Score 2.2581556
Snippet The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications....
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 5399
SubjectTerms Compilers
Computation
Computer Science
Computer simulation
Industry standards
Interpreters
Load balancing
Processor Architectures
Programming Languages
Resource scheduling
Scheduling
Servers
State of the art
Title E-OSched: a load balancing scheduler for heterogeneous multicores
URI https://link.springer.com/article/10.1007/s11227-018-2435-1
https://www.proquest.com/docview/2117464945
Volume 74
WOSCitedRecordID wos000446893600027&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: Springer Nature - Connect here FIRST to enable access
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fS8MwEA4yffDF-ROnU_LgkxJY0zRLfBuy4dMUprK3kKQtCmOVdfPv9y5rLYoK-tqmSbm75L7jLt8RcuGlRUoWziRYBxNJ4piGI5AJbp2NhbKKu9Bsoj8eq-lU31f3uMu62r1OSYaTurnsFnGOZZKgWfDxDEKezQTJZjBEnzw1lyHjdVpZQ1ykEsHrVOZ3U3x2Rg3C_JIUDb5m1P7XX-6SnQpa0sHaFvbIRjbfJ-26bQOtdvEBGQzZ3QR0lV5TS2eFTanD-kYPy9ASn69mMBzALH3GWpkCTCwrViUNtYfIelkeksfR8OHmllWdFJiPI7lkjjsXWaFdmnPRBy3oyEnNbd6TyluVRzJDInkPYNABIstBZeDIRQ7hkA1tHY9Ia17Ms2NCRYxNFADnpQDFklQrJayVnHuACogPO6RXi9T4imYcu13MTEOQjCIyICKDIjJRh1x-fPK65tj4bXC31pOptltpIIrtCyk0Ln9V66V5_eNkJ38afUq2OSo2lPJ1SWu5WGVnZMu_LV_KxXmwwndCotKW
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3dS8MwEA8yBX1xfuL8zINPSmBN0yzxbcjGxDmFTdlbSdIWhbHKuvn3e9e1FkUFfW3TpNzvkvsdd7kj5NxJgyVZOJOgHUwEgWUajkAmuLHGF8oobvNmE63BQI3H-qG4x52V2e5lSDI_qavLbh7nmCYJyIKNZ-DyrArssoMu-vCpugzpL8PKGvwiFQhehjK_m-KzMaoY5pegaG5ruvV__eUW2SyoJW0vdWGbrMTTHVIv2zbQYhfvknaH3Q8Bq-iKGjpJTUQt5jc6WIZm-HwxgeFAZukz5sqkoGJxushonnuIVS-zPfLY7Yyue6zopMCc78k5s9xazwhto4SLFqCgPSs1N0lTKmdU4skYC8k7IIMWGFkCkIEhFwm4QyZv67hPatN0Gh8QKnxsogA8LwIqFkRaKWGM5NwBVUB-2CDNUqShK8qMY7eLSVgVSEYRhSCiEEUUeg1y8fHJ67LGxm-Dj0ucwmK7ZSF4sS0hhcblL0tcqtc_Tnb4p9FnZL03uuuH_ZvB7RHZ4AhyntZ3TGrz2SI-IWvubf6SzU5zjXwHKvXVeg
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3dS8MwEA-iIr44P3E6NQ8-KWFrmmaJb0M3FGUOp7K3kqQtCqMda-ff76UfFkUF8bXNR7m79H7HXX6H0KnhylKyUMLBOgjzPE0k_AIJo0orlwklqM6bTXSHQzGZyFHZ5zStqt2rlGRxp8GyNMVZexZE7frim0OpLZkELYO_JxD-rDAIZGxN18P4ub4Y6RYpZgkxkvAYrdKa3y3x2THVaPNLgjT3O4PGv794E22UkBP3ChvZQkthvI0aVTsHXJ7uHdTrk_sx6DC4wApPExVgbeseDWyJU_t8MYXhAHLxi62hScD0wmSR4rwm0bJhprvoadB_vLwmZYcFYlyHZ0RTrR3FpA4iyrqgHeloLqmKOlwYJSKHh5Zg3gBI1IDUIlAlOHgWQZik8naPe2g5TuJwH2Hm2uYKgP8CgGheIIVgSnFKDUAIixubqFOJ1zcl_bjtgjH1a-JkKyIfRORbEflOE519TJkV3Bu_DW5VOvPLY5j6EN12GWfSbn9e6ah-_eNiB38afYLWRlcD_-5meHuI1qnVcV7t10LL2XwRHqFV85a9pvPj3DjfAcW43l4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=E-OSched%3A+a+load+balancing+scheduler+for+heterogeneous+multicores&rft.jtitle=The+Journal+of+supercomputing&rft.au=Khalid%2C+Yasir+Noman&rft.au=Aleem%2C+Muhammad&rft.au=Prodan%2C+Radu&rft.au=Iqbal%2C+Muhammad+Azhar&rft.date=2018-10-01&rft.pub=Springer+US&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=74&rft.issue=10&rft.spage=5399&rft.epage=5431&rft_id=info:doi/10.1007%2Fs11227-018-2435-1&rft.externalDocID=10_1007_s11227_018_2435_1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon