Adaptive heterogeneous scheduling for integrated GPUs

Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada s. 151 - 162
Hlavní autoři:	Kaleem, Rashid, Barik, Rajkishore, Shpeisman, Tatiana, Lewis, Brian T., Chunling Hu, Pingali, Keshav
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 01.08.2014
Témata:	C++ languages Graphics processing units Heterogeneous computing integrated GPUs Irregular applications Kernel load balancing Programming scheduling Scheduling algorithms
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors.
AbstractList	Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors.
Author	Barik, Rajkishore Shpeisman, Tatiana Pingali, Keshav Lewis, Brian T. Chunling Hu Kaleem, Rashid
Author_xml	– sequence: 1 givenname: Rashid surname: Kaleem fullname: Kaleem, Rashid email: rashid@cs.utexas.edu organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA – sequence: 2 givenname: Rajkishore surname: Barik fullname: Barik, Rajkishore email: rajkishore.barik@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 3 givenname: Tatiana surname: Shpeisman fullname: Shpeisman, Tatiana email: tatiana.shpeisman@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 4 givenname: Brian T. surname: Lewis fullname: Lewis, Brian T. email: brian.t.lewis@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 5 surname: Chunling Hu fullname: Chunling Hu email: chunling.hu@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 6 givenname: Keshav surname: Pingali fullname: Pingali, Keshav email: pingali@cs.utexas.edu organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
BookMark	eNotjM1KAzEURiNYUNtZu3CTF5ian0lysyxFq1DQhV2XmNxMI3WmJKng2zuoq8N3Pjg35HIYByTklrMl5526F1oAM3z5S4AL0lgD08HktC1ckaaUD8bYpJQR9pqoVXCnmr6QHrBiHnsccDwXWvwBw_mYhp7GMdM0VOyzqxjo5nVXFmQW3bFg88852T0-vK2f2u3L5nm92rZOdKa2ynlAwTuPwIV3UgpvtIogopIeQ6ccRu-CRqsD5yzId8dBWB-DkMoYKefk7q-bEHF_yunT5e-9AaXAavkDUypGCw
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/2628071.2628088
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781450328098 1450328091
EndPage	162
ExternalDocumentID	7855896
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL
ID	FETCH-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733
IEDL.DBID	RIE
ISICitedReferencesCount	77
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:07:49 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733
PageCount	12
ParticipantIDs	ieee_primary_7855896
PublicationCentury	2000
PublicationDate	2014-Aug.
PublicationDateYYYYMMDD	2014-08-01
PublicationDate_xml	– month: 08 year: 2014 text: 2014-Aug.
PublicationDecade	2010
PublicationTitle	PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada
PublicationTitleAbbrev	PACT
PublicationYear	2014
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssj0001455729
Score	2.2482362
Snippet	Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU...
SourceID	ieee
SourceType	Publisher
StartPage	151
SubjectTerms	C++ languages Graphics processing units Heterogeneous computing integrated GPUs Irregular applications Kernel load balancing Programming scheduling Scheduling algorithms
Title	Adaptive heterogeneous scheduling for integrated GPUs
URI	https://ieeexplore.ieee.org/document/7855896
WOSCitedRecordID	wos000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07TwJBEJ4AsbBCBeM7W1h6AreP2y2NES0MoRBDR5ad2WgDhIe_39njBAsbq7295pJ93Dff7nzfANwadC4G3mnKSM8ERfcyT12bRdfLvQrGhVIh9_5aDAZ2PHbDGtzttDBEVCaf0X16LO_ycR426aisU1itrTN1qBeF2Wq19ucpSmsOFCv3Hu51cpOcXpgDpjZVVvlVPqVEj37zf989gvZehieGO4A5hhrNTqD5U4dBVNuyBfoB_SL9tsRHSm6Z85ogJvSCiSsDSdKbCw5Nxc4ZAsXzcLRqw6j_9Pb4klXlEDKfq2KdaR8s5T0ViEE5eCnzUBgdbR61DIRKe4rBoyFnkEEd5dQzeXAhYrK0KaQ8hcZsPqMzED5anNpoYzCkLDJqo1RWx4gY0HXNObTSKEwWW8eLSTUAF3-_voRDDiPUNi3uChrr5Yau4SB8rT9Xy5tymr4BomeVCA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LTgIxFL1BNNEVKhjfduHSEZg-pl0aI2JEwgIMO1La2-gGCA-_39thBBduXM1jM0mnnXNP555zAG6VNyY4WmlCcUsERTYTiw2dBNNMrXDKuFwh997Jul09HJpeCe42WhhEzJvP8D6e5v_y_dSt4lZZPdNSaqN2YDcmZxVqre2OipCSSsXCv4eu6qmKXi_EAuMxZqv8ClDJ8aNV-d-TD6G2FeKx3gZijqCEk2Oo_CQxsGJhVkE-eDuLHy72EdtbpjQrkCg9I-pKUBIV54yKU7bxhvDsuTdY1GDQeuo_tpMiECGxqciWibROY9oUDgmWneU8dZmSQadBcodeSIvBWa_QKE-w7vnYEn0wLvhoapNxfgLlyXSCp8Bs0H6sgw5OodCecNtzoWUI3jtvGuoMqnEURrO158WoGIDzv2_fwH67_9YZdV66rxdwQEWFWDfJXUJ5OV_hFey5r-XnYn6dv7Jvle2YUQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+%2714+%3A+proceedings+of+the+23rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+August+24-27%2C+2014%2C+Edmonton%2C+AB%2C+Canada&rft.atitle=Adaptive+heterogeneous+scheduling+for+integrated+GPUs&rft.au=Kaleem%2C+Rashid&rft.au=Barik%2C+Rajkishore&rft.au=Shpeisman%2C+Tatiana&rft.au=Lewis%2C+Brian+T.&rft.date=2014-08-01&rft.pub=ACM&rft.spage=151&rft.epage=162&rft_id=info:doi/10.1145%2F2628071.2628088&rft.externalDocID=7855896