Augmentation of Programs with CUDA Streams

A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computati...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications s. 855 - 856
Hlavní autoři: Sharmistha, Amilkanthwar, M., Balachandran, S.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.07.2012
Témata:
ISBN:1467316318, 9781467316316
ISSN:2158-9178
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
AbstractList A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
Author Balachandran, S.
Amilkanthwar, M.
Sharmistha
Author_xml – sequence: 1
  surname: Sharmistha
  fullname: Sharmistha
  email: sharmist@cse.iitm.ac.in
  organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
– sequence: 2
  givenname: M.
  surname: Amilkanthwar
  fullname: Amilkanthwar, M.
  email: madhur@cse.iitm.ac.in
  organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
– sequence: 3
  givenname: S.
  surname: Balachandran
  fullname: Balachandran, S.
  email: shankar@cse.iitm.ac.in
  organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
BookMark eNotjE1Lw0AUAJ9Ywbbm6MlLzkLie_uR3XcMsWqhYKH1XDbubo3YRJKI-O8N6GlgGGYBs7ZrA8A1YU6EfLfebctcIImcpDiDhI1FU7BWBkmfw4JUYSQVkuwM5oK0zZiMvYRkGN4RcfKWGedwW34dT6Ed3dh0bdrFdNt3x96dhvS7Gd_S6uW-THdjHyZzBRfRfQwh-ecS9g-rffWUbZ4f11W5yRrGMZMeSbGTwdfSIesY8dVZFZWv2XtS1gv2wtpQG1VbL1lGzSQkGqaga5RLuPnbNiGEw2ffnFz_cyiExamVv_0uRD4
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPA.2012.132
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9780769547015
076954701X
EndPage 856
ExternalDocumentID 6280393
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03
IEDL.DBID RIE
ISBN 1467316318
9781467316316
ISSN 2158-9178
IngestDate Wed Aug 27 04:59:00 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03
PageCount 2
ParticipantIDs ieee_primary_6280393
PublicationCentury 2000
PublicationDate 2012-July
PublicationDateYYYYMMDD 2012-07-01
PublicationDate_xml – month: 07
  year: 2012
  text: 2012-July
PublicationDecade 2010
PublicationTitle 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications
PublicationTitleAbbrev ispa
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003188990
ssj0002672344
Score 1.4918895
Snippet A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture,...
SourceID ieee
SourceType Publisher
StartPage 855
SubjectTerms CUDA
Graphics processing unit
Kernel
Optimization
Parallel processing
Pluto
Tiles
Title Augmentation of Programs with CUDA Streams
URI https://ieeexplore.ieee.org/document/6280393
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ9MiLRxnPpjrAoVSKiqREHdKju2UQcS1A9-P740bRlY2GJHkewk1vOd37sHcCuzmGmlfKStklFA6CwyJqGR4Ep7Kqxlpbbq_UWMRnI6VeMa3O-0MM65knzmOnhZnuXbIltjqqzL0UpJsTrUheAbrdYun5JwkbAKqrAd_tUQSmCKJYAarmkhS10XR6umcHNb7qlq8339ze7z67iPpK-kQ9GV5JfrSgk6w-b_hnsE7b16j4x3uHQMNZefQHNr30Cq1dyCu_7647OSHuWk8PgQcrWWBJOzZPD20Cd4aB162jAZPk4GT1FlnRDNVbyKmMXIRzNnDdNhE-V9nGmZ-tQaZW2IgGyibCKlMyI10oZB-p6iyAlX1PVMzE6hkRe5OwOSaa-cDmGPZ2Gr4mNNvU6tcMywjHKXnkMLZz772hTHmFWTvvi7-xIO8cVu-K5X0Fgt1u4aDrLv1Xy5uCm_6A_wLJk9
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24Mn43Brx9YeCUogIiERDTfSn4aDw_DDv9--McCDF29rlyXttubre_2-9wHcch0yKYQLpBE88AitA6VoFKSJkC5KjWG5tuq9l_b7fDQSgxLcb7Qw1tqcfGbreJmf5ZupXmKq7CFBKyXBdmC3Ecc0XKm1NhkVmqSUFWCFbf-3-mACkywe1nBVpzxXdiVo1uRvrgs-Fe1kW4Hzofs6aCLti9Yj9CX55buSw0678r8BH0Jtq98jgw0yHUHJZsdQWRs4kGI9V-Guufz4LMRHGZk6fAjZWnOC6VnSentsEjy29j01GLafhq1OUJgnBBMRLgJmMPaRzBrFpN9GORdqyWMXGyWM8TGQocJQzq1KY8WNH6RriAhZ4SKyDRWyEyhn08yeAtHSCSt94OOY36y4UEZOxia1TDEdJTY-gyrOfPy1Ko8xLiZ9_nf3Dex3hi-9ca_bf76AA3zJK_brJZQXs6W9gj39vZjMZ9f51_0BIzGchA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+10th+International+Symposium+on+Parallel+and+Distributed+Processing+with+Applications&rft.atitle=Augmentation+of+Programs+with+CUDA+Streams&rft.au=Sharmistha&rft.au=Amilkanthwar%2C+M.&rft.au=Balachandran%2C+S.&rft.date=2012-07-01&rft.pub=IEEE&rft.isbn=1467316318&rft.issn=2158-9178&rft.spage=855&rft.epage=856&rft_id=info:doi/10.1109%2FISPA.2012.132&rft.externalDocID=6280393
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-9178&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-9178&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-9178&client=summon