EDGE: Event-Driven GPU Execution

GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tas...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / International Conference on Parallel Architectures and Compilation Techniques pp. 337 - 353
Main Authors: Hetherington, Tayler Hicklin, Lubeznov, Maria, Shah, Deval, Aamodt, Tor M.
Format: Conference Proceeding
Language:English
Published: IEEE 01.09.2019
Subjects:
ISSN:2641-7936
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tasks, and limited multiprogramming capabilities introduce challenges with efficiently supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction. Along with freeing up the CPU to work on other tasks, we estimate that EDGE can reduce the kernel launch latency by 4.4xcompared to the baseline CPU-launched approach. This paper also proposes a warp-level preemption mechanism to further reduce the end-to-end latency of fine-grained tasks in a shared GPU environment. We evaluate multiple optimizations that reduce the average warp preemption latency by 35.9x over waiting for a preempted warp to naturally flush the pipeline. When compared to waiting for the first available resources, we find that warp-level preemption reduces the average and tail warp scheduling latencies by 2.6x and 2.9x, respectively, and improves the average normalized turnaround time by 1.4x.
AbstractList GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for irregular streaming network tasks. However, the GPU's co-processor dependence on a CPU for task management, inefficiencies with fine-grained tasks, and limited multiprogramming capabilities introduce challenges with efficiently supporting latency-sensitive streaming tasks. This paper proposes an event-driven GPU execution model, EDGE, that enables non-CPU devices to directly launch preconfigured tasks on a GPU without CPU interaction. Along with freeing up the CPU to work on other tasks, we estimate that EDGE can reduce the kernel launch latency by 4.4xcompared to the baseline CPU-launched approach. This paper also proposes a warp-level preemption mechanism to further reduce the end-to-end latency of fine-grained tasks in a shared GPU environment. We evaluate multiple optimizations that reduce the average warp preemption latency by 35.9x over waiting for a preempted warp to naturally flush the pipeline. When compared to waiting for the first available resources, we find that warp-level preemption reduces the average and tail warp scheduling latencies by 2.6x and 2.9x, respectively, and improves the average normalized turnaround time by 1.4x.
Author Lubeznov, Maria
Hetherington, Tayler Hicklin
Aamodt, Tor M.
Shah, Deval
Author_xml – sequence: 1
  givenname: Tayler Hicklin
  surname: Hetherington
  fullname: Hetherington, Tayler Hicklin
  organization: The University of British Columbia
– sequence: 2
  givenname: Maria
  surname: Lubeznov
  fullname: Lubeznov, Maria
  organization: The University of British Columbia
– sequence: 3
  givenname: Deval
  surname: Shah
  fullname: Shah, Deval
  organization: The University of British Columbia
– sequence: 4
  givenname: Tor M.
  surname: Aamodt
  fullname: Aamodt, Tor M.
  organization: The University of British Columbia
BookMark eNotzE9LwzAYgPFXUXCdnj146RdIzZs_bxJvo6t1MNgOG3gbaZpARTtpq-i330BPv9PzZHDVH_sIcI-8QOTucbsod4Xg6ArOuVQXkKERFiWhfL2EmSCFzDhJN5CN4xvnCknLGeTVsq6e8uo79hNbDt3ZvN7u8-onhq-pO_a3cJ38-xjv_p3D_rnalS9svalX5WLNvDB6Yr6loExodQrcm-h0CNanhpxJTmPSrkXCkBQF0-iUpKCAVnJshKBoY5BzePj7djHGw-fQffjh92CtO3dCngDcAD3K
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/PACT.2019.00034
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 172813613X
9781728136134
EISSN 2641-7936
EndPage 353
ExternalDocumentID 8891612
Genre orig-research
GroupedDBID 123
23M
29O
6IE
6IL
ACGFS
AFFNX
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIL
RNS
ID FETCH-LOGICAL-a275t-ad6c47cd5fc0a7e95cc8afb697f951f59d161cf46c7b5ff326c18301b226e8ec3
IEDL.DBID RIE
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000550990200026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:43:19 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-ad6c47cd5fc0a7e95cc8afb697f951f59d161cf46c7b5ff326c18301b226e8ec3
PageCount 17
ParticipantIDs ieee_primary_8891612
PublicationCentury 2000
PublicationDate 2019-Sept.
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-Sept.
PublicationDecade 2010
PublicationTitle Proceedings / International Conference on Parallel Architectures and Compilation Techniques
PublicationTitleAbbrev PACT
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0041653
ssib057737306
Score 2.1171885
Snippet GPUs are known to benefit structured applications with ample parallelism, such as deep learning in a datacenter. Recently, GPUs have shown promise for...
SourceID ieee
SourceType Publisher
StartPage 337
SubjectTerms Central Processing Unit
GPU
Graphics processing units
Hardware
Instruction sets
Kernel
Multiprogramming
Networking
Task analysis
Throughput
Title EDGE: Event-Driven GPU Execution
URI https://ieeexplore.ieee.org/document/8891612
WOSCitedRecordID wos000550990200026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0g8eAJFYzf6cGjlYXd7rTejIIeDNkDJNxIdzpNvIBBMP5822VBD168NU0P03amfWln3gO4KbUzGVuW1HN9mZFKpFXWh1bSd1wSa1MVCr_iaKSnU1M04HZXC8PMVfIZ38Vm9ZfvFrSOT2VdrQOYiZLCe4i4qdXa-o5CTIOz5ttTOOAMldZUPr3EdIuHx3FM5IrslElUSf6lpVJdJcPW_4w4hM5PTZ4odrfNETR4fgytrSiDqGO0DSJyHN-LQcxjlE_LeJiJ52IiBl9MlZN1YDIcjB9fZC2DIG0f1Upal1OG5JSnxCIbRaStL3ODPsAjr4wL9pDPcsJSeR_wGIU4TXplQFasmdITaM4Xcz4F4ZxmjRwGu4DUIi2w4pRSInQ2DZF4Bu044dn7huliVs_1_O_uCziIK7rJuLqE5mq55ivYp8_V28fyutqeb76Gj8M
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0QNNETKhi_7cGjlf3qtuvNIIgRyR4g4Ua602niBQyC8efbLgt68OKtaXqYtjPtSzvzHsBNoUyWkCaOoYl4giLgWmjrWkFkqEBSWVkoPJDDoZpMsrwGt9taGCIqk8_ozjfLv3wzx5V_Kmsr5cCMlxTeEUkShetqrY33CClj567p5hx2SEPEFZlPGGTt_KEz8qlcnp8y8DrJv9RUysuk1_ifGQfQ-qnKY_n2vjmEGs2OoLGRZWBVlDaBeZbje9b1mYz8ceGPM_aUj1n3i7B0sxaMe91Rp88rIQSuIymWXJsUE4lGWAy0pEwgKm2LNJPWASQrMuPsQZukKAthrUNk6CI1CAuHrUgRxsdQn81ndALMGEVKkhtsHFbzxMCCYowRpdGxi8VTaPoJT9_XXBfTaq5nf3dfw15_9DqYDp6HL-ew71d3nX91AfXlYkWXsIufy7ePxVW5Vd9u85MK
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques&rft.atitle=EDGE%3A+Event-Driven+GPU+Execution&rft.au=Hetherington%2C+Tayler+Hicklin&rft.au=Lubeznov%2C+Maria&rft.au=Shah%2C+Deval&rft.au=Aamodt%2C+Tor+M.&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2641-7936&rft.spage=337&rft.epage=353&rft_id=info:doi/10.1109%2FPACT.2019.00034&rft.externalDocID=8891612