Adaptive heterogeneous scheduling for integrated GPUs

Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada s. 151 - 162
Hlavní autoři: Kaleem, Rashid, Barik, Rajkishore, Shpeisman, Tatiana, Lewis, Brian T., Chunling Hu, Pingali, Keshav
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 01.08.2014
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors.
AbstractList Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors.
Author Barik, Rajkishore
Shpeisman, Tatiana
Pingali, Keshav
Lewis, Brian T.
Chunling Hu
Kaleem, Rashid
Author_xml – sequence: 1
  givenname: Rashid
  surname: Kaleem
  fullname: Kaleem, Rashid
  email: rashid@cs.utexas.edu
  organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
– sequence: 2
  givenname: Rajkishore
  surname: Barik
  fullname: Barik, Rajkishore
  email: rajkishore.barik@intel.com
  organization: Intel Labs., Santa Clara, CA, USA
– sequence: 3
  givenname: Tatiana
  surname: Shpeisman
  fullname: Shpeisman, Tatiana
  email: tatiana.shpeisman@intel.com
  organization: Intel Labs., Santa Clara, CA, USA
– sequence: 4
  givenname: Brian T.
  surname: Lewis
  fullname: Lewis, Brian T.
  email: brian.t.lewis@intel.com
  organization: Intel Labs., Santa Clara, CA, USA
– sequence: 5
  surname: Chunling Hu
  fullname: Chunling Hu
  email: chunling.hu@intel.com
  organization: Intel Labs., Santa Clara, CA, USA
– sequence: 6
  givenname: Keshav
  surname: Pingali
  fullname: Pingali, Keshav
  email: pingali@cs.utexas.edu
  organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
BookMark eNotjM1KAzEURiNYUNtZu3CTF5ian0lysyxFq1DQhV2XmNxMI3WmJKng2zuoq8N3Pjg35HIYByTklrMl5526F1oAM3z5S4AL0lgD08HktC1ckaaUD8bYpJQR9pqoVXCnmr6QHrBiHnsccDwXWvwBw_mYhp7GMdM0VOyzqxjo5nVXFmQW3bFg88852T0-vK2f2u3L5nm92rZOdKa2ynlAwTuPwIV3UgpvtIogopIeQ6ccRu-CRqsD5yzId8dBWB-DkMoYKefk7q-bEHF_yunT5e-9AaXAavkDUypGCw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2628071.2628088
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450328098
1450328091
EndPage 162
ExternalDocumentID 7855896
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733
IEDL.DBID RIE
ISICitedReferencesCount 77
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:07:49 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733
PageCount 12
ParticipantIDs ieee_primary_7855896
PublicationCentury 2000
PublicationDate 2014-Aug.
PublicationDateYYYYMMDD 2014-08-01
PublicationDate_xml – month: 08
  year: 2014
  text: 2014-Aug.
PublicationDecade 2010
PublicationTitle PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada
PublicationTitleAbbrev PACT
PublicationYear 2014
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0001455729
Score 2.2482362
Snippet Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU...
SourceID ieee
SourceType Publisher
StartPage 151
SubjectTerms C++ languages
Graphics processing units
Heterogeneous computing
integrated GPUs
Irregular applications
Kernel
load balancing
Programming
scheduling
Scheduling algorithms
Title Adaptive heterogeneous scheduling for integrated GPUs
URI https://ieeexplore.ieee.org/document/7855896
WOSCitedRecordID wos000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07TwJBEJ4AsbBCBeM7W1h6AreP2y2NES0MoRBDR5ad2WgDhIe_39njBAsbq7295pJ93Dff7nzfANwadC4G3mnKSM8ERfcyT12bRdfLvQrGhVIh9_5aDAZ2PHbDGtzttDBEVCaf0X16LO_ycR426aisU1itrTN1qBeF2Wq19ucpSmsOFCv3Hu51cpOcXpgDpjZVVvlVPqVEj37zf989gvZehieGO4A5hhrNTqD5U4dBVNuyBfoB_SL9tsRHSm6Z85ogJvSCiSsDSdKbCw5Nxc4ZAsXzcLRqw6j_9Pb4klXlEDKfq2KdaR8s5T0ViEE5eCnzUBgdbR61DIRKe4rBoyFnkEEd5dQzeXAhYrK0KaQ8hcZsPqMzED5anNpoYzCkLDJqo1RWx4gY0HXNObTSKEwWW8eLSTUAF3-_voRDDiPUNi3uChrr5Yau4SB8rT9Xy5tymr4BomeVCA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LTgIxFL1BNNEVKhjfduHSEZg-pl0aI2JEwgIMO1La2-gGCA-_39thBBduXM1jM0mnnXNP555zAG6VNyY4WmlCcUsERTYTiw2dBNNMrXDKuFwh997Jul09HJpeCe42WhhEzJvP8D6e5v_y_dSt4lZZPdNSaqN2YDcmZxVqre2OipCSSsXCv4eu6qmKXi_EAuMxZqv8ClDJ8aNV-d-TD6G2FeKx3gZijqCEk2Oo_CQxsGJhVkE-eDuLHy72EdtbpjQrkCg9I-pKUBIV54yKU7bxhvDsuTdY1GDQeuo_tpMiECGxqciWibROY9oUDgmWneU8dZmSQadBcodeSIvBWa_QKE-w7vnYEn0wLvhoapNxfgLlyXSCp8Bs0H6sgw5OodCecNtzoWUI3jtvGuoMqnEURrO158WoGIDzv2_fwH67_9YZdV66rxdwQEWFWDfJXUJ5OV_hFey5r-XnYn6dv7Jvle2YUQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+%2714+%3A+proceedings+of+the+23rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+August+24-27%2C+2014%2C+Edmonton%2C+AB%2C+Canada&rft.atitle=Adaptive+heterogeneous+scheduling+for+integrated+GPUs&rft.au=Kaleem%2C+Rashid&rft.au=Barik%2C+Rajkishore&rft.au=Shpeisman%2C+Tatiana&rft.au=Lewis%2C+Brian+T.&rft.date=2014-08-01&rft.pub=ACM&rft.spage=151&rft.epage=162&rft_id=info:doi/10.1145%2F2628071.2628088&rft.externalDocID=7855896