EDGE: Event-Driven GPU Execution
GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tas...
Saved in:
| Published in: | Proceedings / International Conference on Parallel Architectures and Compilation Techniques pp. 337 - 353 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.09.2019
|
| Subjects: | |
| ISSN: | 2641-7936 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tasks, and limited multiprogramming capabilities introduce challenges with efficiently supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction. Along with freeing up the CPU to work on other tasks, we estimate that EDGE can reduce the kernel launch latency by 4.4xcompared to the baseline CPU-launched approach. This paper also proposes a warp-level preemption mechanism to further reduce the end-to-end latency of fine-grained tasks in a shared GPU environment. We evaluate multiple optimizations that reduce the average warp preemption latency by 35.9x over waiting for a preempted warp to naturally flush the pipeline. When compared to waiting for the first available resources, we find that warp-level preemption reduces the average and tail warp scheduling latencies by 2.6x and 2.9x, respectively, and improves the average normalized turnaround time by 1.4x. |
|---|---|
| AbstractList | GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tasks, and limited multiprogramming capabilities introduce challenges with efficiently supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction. Along with freeing up the CPU to work on other tasks, we estimate that EDGE can reduce the kernel launch latency by 4.4xcompared to the baseline CPU-launched approach. This paper also proposes a warp-level preemption mechanism to further reduce the end-to-end latency of fine-grained tasks in a shared GPU environment. We evaluate multiple optimizations that reduce the average warp preemption latency by 35.9x over waiting for a preempted warp to naturally flush the pipeline. When compared to waiting for the first available resources, we find that warp-level preemption reduces the average and tail warp scheduling latencies by 2.6x and 2.9x, respectively, and improves the average normalized turnaround time by 1.4x. |
| Author | Lubeznov, Maria Hetherington, Tayler Hicklin Aamodt, Tor M. Shah, Deval |
| Author_xml | – sequence: 1 givenname: Tayler Hicklin surname: Hetherington fullname: Hetherington, Tayler Hicklin organization: The University of British Columbia – sequence: 2 givenname: Maria surname: Lubeznov fullname: Lubeznov, Maria organization: The University of British Columbia – sequence: 3 givenname: Deval surname: Shah fullname: Shah, Deval organization: The University of British Columbia – sequence: 4 givenname: Tor M. surname: Aamodt fullname: Aamodt, Tor M. organization: The University of British Columbia |
| BookMark | eNotzE9LwzAYgPFXUXCdnj146RdIzZs_bxJvo6t1MNgOG3gbaZpARTtpq-i330BPv9PzZHDVH_sIcI-8QOTucbsod4Xg6ArOuVQXkKERFiWhfL2EmSCFzDhJN5CN4xvnCknLGeTVsq6e8uo79hNbDt3ZvN7u8-onhq-pO_a3cJ38-xjv_p3D_rnalS9svalX5WLNvDB6Yr6loExodQrcm-h0CNanhpxJTmPSrkXCkBQF0-iUpKCAVnJshKBoY5BzePj7djHGw-fQffjh92CtO3dCngDcAD3K |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/PACT.2019.00034 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 172813613X 9781728136134 |
| EISSN | 2641-7936 |
| EndPage | 353 |
| ExternalDocumentID | 8891612 |
| Genre | orig-research |
| GroupedDBID | 123 23M 29O 6IE 6IL ACGFS AFFNX ALMA_UNASSIGNED_HOLDINGS CBEJK M43 RIE RIL RNS |
| ID | FETCH-LOGICAL-a275t-ad6c47cd5fc0a7e95cc8afb697f951f59d161cf46c7b5ff326c18301b226e8ec3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:43:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-ad6c47cd5fc0a7e95cc8afb697f951f59d161cf46c7b5ff326c18301b226e8ec3 |
| PageCount | 17 |
| ParticipantIDs | ieee_primary_8891612 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Sept. |
| PublicationDateYYYYMMDD | 2019-09-01 |
| PublicationDate_xml | – month: 09 year: 2019 text: 2019-Sept. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings / International Conference on Parallel Architectures and Compilation Techniques |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0041653 ssib057737306 |
| Score | 2.1171885 |
| Snippet | GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 337 |
| SubjectTerms | Central Processing Unit GPU Graphics processing units Hardware Instruction sets Kernel Multiprogramming Networking Task analysis Throughput |
| Title | EDGE: Event-Driven GPU Execution |
| URI | https://ieeexplore.ieee.org/document/8891612 |
| WOSCitedRecordID | wos000550990200026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0g8eAJFYzf6cGjlYXd7rTejIIeDNkDJNxIdzpNvIBBMP5822VBD168NU0P03amfWln3gO4KbUzGVuW1HN9mZFKpFXWh1bSd1wSa1MVCr_iaKSnU1M04HZXC8PMVfIZ38Vm9ZfvFrSOT2VdrQOYiZLCe4i4qdXa-o5CTIOz5ttTOOAMldZUPr3EdIuHx3FM5IrslElUSf6lpVJdJcPW_4w4hM5PTZ4odrfNETR4fgytrSiDqGO0DSJyHN-LQcxjlE_LeJiJ52IiBl9MlZN1YDIcjB9fZC2DIG0f1Upal1OG5JSnxCIbRaStL3ODPsAjr4wL9pDPcsJSeR_wGIU4TXplQFasmdITaM4Xcz4F4ZxmjRwGu4DUIi2w4pRSInQ2DZF4Bu044dn7huliVs_1_O_uCziIK7rJuLqE5mq55ivYp8_V28fyutqeb76Gj8M |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0QNNETKhi_7cGjlf3qtuvNIIgRyR4g4Ua602niBQyC8efbLgt68OKtaXqYtjPtSzvzHsBNoUyWkCaOoYl4giLgWmjrWkFkqEBSWVkoPJDDoZpMsrwGt9taGCIqk8_ozjfLv3wzx5V_Kmsr5cCMlxTeEUkShetqrY33CClj567p5hx2SEPEFZlPGGTt_KEz8qlcnp8y8DrJv9RUysuk1_ifGQfQ-qnKY_n2vjmEGs2OoLGRZWBVlDaBeZbje9b1mYz8ceGPM_aUj1n3i7B0sxaMe91Rp88rIQSuIymWXJsUE4lGWAy0pEwgKm2LNJPWASQrMuPsQZukKAthrUNk6CI1CAuHrUgRxsdQn81ndALMGEVKkhtsHFbzxMCCYowRpdGxi8VTaPoJT9_XXBfTaq5nf3dfw15_9DqYDp6HL-ew71d3nX91AfXlYkWXsIufy7ePxVW5Vd9u85MK |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=EDGE%3A+Event-Driven+GPU+Execution&rft.au=Hetherington%2C+Tayler+Hicklin&rft.au=Lubeznov%2C+Maria&rft.au=Shah%2C+Deval&rft.au=Aamodt%2C+Tor+M.&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=337&rft.epage=353&rft_id=info:doi/10.1109%2FPACT.2019.00034&rft.externalDocID=8891612 |