Adaptive heterogeneous scheduling for integrated GPUs
Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel...
Uloženo v:
| Vydáno v: | PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada s. 151 - 162 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
01.08.2014
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors. |
|---|---|
| AbstractList | Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU communication. As a consequence, programmers can effectively utilize both the CPU and GPU to execute a single application. This paper presents novel adaptive scheduling techniques for integrated CPU-GPU processors. We present two online profiling-based scheduling algorithms: naïve and asymmetric. Our asymmetric scheduling algorithm uses low-overhead online profiling to automatically partition the work of dataparallel kernels between the CPU and GPU without input from application developers. It does profiling on the CPU and GPU in a way that doesn't penalize GPU-centric workloads that run significantly faster on the GPU. It adapts to application characteristics by addressing: 1) load imbalance via irregularity caused by, e.g., data-dependent control flow, 2) different amounts of work on each kernel call, and 3) multiple kernels with different characteristics. Unlike many existing approaches primarily targeting NVIDIA discrete GPUs, our scheduling algorithm does not require offline processing. We evaluate our asymmetric scheduling algorithm on a desktop system with an Intel 4 th Generation Core Processor using a set of sixteen regular and irregular workloads from diverse application areas. On average, our asymmetric scheduling algorithm performs within 3.2% of the maximum throughput with a CPU-and-GPU oracle that always chooses the best work partitioning between the CPU and GPU. These results underscore the feasibility of online profile-based heterogeneous scheduling on integrated CPU-GPU processors. |
| Author | Barik, Rajkishore Shpeisman, Tatiana Pingali, Keshav Lewis, Brian T. Chunling Hu Kaleem, Rashid |
| Author_xml | – sequence: 1 givenname: Rashid surname: Kaleem fullname: Kaleem, Rashid email: rashid@cs.utexas.edu organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA – sequence: 2 givenname: Rajkishore surname: Barik fullname: Barik, Rajkishore email: rajkishore.barik@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 3 givenname: Tatiana surname: Shpeisman fullname: Shpeisman, Tatiana email: tatiana.shpeisman@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 4 givenname: Brian T. surname: Lewis fullname: Lewis, Brian T. email: brian.t.lewis@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 5 surname: Chunling Hu fullname: Chunling Hu email: chunling.hu@intel.com organization: Intel Labs., Santa Clara, CA, USA – sequence: 6 givenname: Keshav surname: Pingali fullname: Pingali, Keshav email: pingali@cs.utexas.edu organization: Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA |
| BookMark | eNotjM1KAzEURiNYUNtZu3CTF5ian0lysyxFq1DQhV2XmNxMI3WmJKng2zuoq8N3Pjg35HIYByTklrMl5526F1oAM3z5S4AL0lgD08HktC1ckaaUD8bYpJQR9pqoVXCnmr6QHrBiHnsccDwXWvwBw_mYhp7GMdM0VOyzqxjo5nVXFmQW3bFg88852T0-vK2f2u3L5nm92rZOdKa2ynlAwTuPwIV3UgpvtIogopIeQ6ccRu-CRqsD5yzId8dBWB-DkMoYKefk7q-bEHF_yunT5e-9AaXAavkDUypGCw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/2628071.2628088 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450328098 1450328091 |
| EndPage | 162 |
| ExternalDocumentID | 7855896 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 77 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:07:49 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-5ac8e214ce812ca332c765f82f53ced45aefcad6e96d110d3ba1829cfd2357733 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_7855896 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-Aug. |
| PublicationDateYYYYMMDD | 2014-08-01 |
| PublicationDate_xml | – month: 08 year: 2014 text: 2014-Aug. |
| PublicationDecade | 2010 |
| PublicationTitle | PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2014 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0001455729 |
| Score | 2.2482362 |
| Snippet | Many processors today integrate a CPU and GPU on the same die, which allows them to share resources like physical memory and lowers the cost of CPU-GPU... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 151 |
| SubjectTerms | C++ languages Graphics processing units Heterogeneous computing integrated GPUs Irregular applications Kernel load balancing Programming scheduling Scheduling algorithms |
| Title | Adaptive heterogeneous scheduling for integrated GPUs |
| URI | https://ieeexplore.ieee.org/document/7855896 |
| WOSCitedRecordID | wos000396396800014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07TwJBEJ4AsbBCBeM7W1h6AreP2y2NES0MoRBDR5ad2WgDhIe_39njBAsbq7295pJ93Dff7nzfANwadC4G3mnKSM8ERfcyT12bRdfLvQrGhVIh9_5aDAZ2PHbDGtzttDBEVCaf0X16LO_ycR426aisU1itrTN1qBeF2Wq19ucpSmsOFCv3Hu51cpOcXpgDpjZVVvlVPqVEj37zf989gvZehieGO4A5hhrNTqD5U4dBVNuyBfoB_SL9tsRHSm6Z85ogJvSCiSsDSdKbCw5Nxc4ZAsXzcLRqw6j_9Pb4klXlEDKfq2KdaR8s5T0ViEE5eCnzUBgdbR61DIRKe4rBoyFnkEEd5dQzeXAhYrK0KaQ8hcZsPqMzED5anNpoYzCkLDJqo1RWx4gY0HXNObTSKEwWW8eLSTUAF3-_voRDDiPUNi3uChrr5Yau4SB8rT9Xy5tymr4BomeVCA |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LTgIxFL1BNNEVKhjfduHSEZg-pl0aI2JEwgIMO1La2-gGCA-_39thBBduXM1jM0mnnXNP555zAG6VNyY4WmlCcUsERTYTiw2dBNNMrXDKuFwh997Jul09HJpeCe42WhhEzJvP8D6e5v_y_dSt4lZZPdNSaqN2YDcmZxVqre2OipCSSsXCv4eu6qmKXi_EAuMxZqv8ClDJ8aNV-d-TD6G2FeKx3gZijqCEk2Oo_CQxsGJhVkE-eDuLHy72EdtbpjQrkCg9I-pKUBIV54yKU7bxhvDsuTdY1GDQeuo_tpMiECGxqciWibROY9oUDgmWneU8dZmSQadBcodeSIvBWa_QKE-w7vnYEn0wLvhoapNxfgLlyXSCp8Bs0H6sgw5OodCecNtzoWUI3jtvGuoMqnEURrO158WoGIDzv2_fwH67_9YZdV66rxdwQEWFWDfJXUJ5OV_hFey5r-XnYn6dv7Jvle2YUQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+%2714+%3A+proceedings+of+the+23rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+August+24-27%2C+2014%2C+Edmonton%2C+AB%2C+Canada&rft.atitle=Adaptive+heterogeneous+scheduling+for+integrated+GPUs&rft.au=Kaleem%2C+Rashid&rft.au=Barik%2C+Rajkishore&rft.au=Shpeisman%2C+Tatiana&rft.au=Lewis%2C+Brian+T.&rft.date=2014-08-01&rft.pub=ACM&rft.spage=151&rft.epage=162&rft_id=info:doi/10.1145%2F2628071.2628088&rft.externalDocID=7855896 |