Augmentation of Programs with CUDA Streams

A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computati...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications s. 855 - 856
Hlavní autoři:	Sharmistha, Amilkanthwar, M., Balachandran, S.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.07.2012
Témata:	CUDA Graphics processing unit Kernel Optimization Parallel processing Pluto Tiles
ISBN:	1467316318, 9781467316316
ISSN:	2158-9178
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
AbstractList	A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
Author	Balachandran, S. Amilkanthwar, M. Sharmistha
Author_xml	– sequence: 1 surname: Sharmistha fullname: Sharmistha email: sharmist@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India – sequence: 2 givenname: M. surname: Amilkanthwar fullname: Amilkanthwar, M. email: madhur@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India – sequence: 3 givenname: S. surname: Balachandran fullname: Balachandran, S. email: shankar@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India
BookMark	eNotjE1Lw0AUAJ9Ywbbm6MlLzkLie_uR3XcMsWqhYKH1XDbubo3YRJKI-O8N6GlgGGYBs7ZrA8A1YU6EfLfebctcIImcpDiDhI1FU7BWBkmfw4JUYSQVkuwM5oK0zZiMvYRkGN4RcfKWGedwW34dT6Ed3dh0bdrFdNt3x96dhvS7Gd_S6uW-THdjHyZzBRfRfQwh-ecS9g-rffWUbZ4f11W5yRrGMZMeSbGTwdfSIesY8dVZFZWv2XtS1gv2wtpQG1VbL1lGzSQkGqaga5RLuPnbNiGEw2ffnFz_cyiExamVv_0uRD4
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ISPA.2012.132
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9780769547015 076954701X
EndPage	856
ExternalDocumentID	6280393
Genre	orig-research
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03
IEDL.DBID	RIE
ISBN	1467316318 9781467316316
ISSN	2158-9178
IngestDate	Wed Aug 27 04:59:00 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03
PageCount	2
ParticipantIDs	ieee_primary_6280393
PublicationCentury	2000
PublicationDate	2012-July
PublicationDateYYYYMMDD	2012-07-01
PublicationDate_xml	– month: 07 year: 2012 text: 2012-July
PublicationDecade	2010
PublicationTitle	2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications
PublicationTitleAbbrev	ispa
PublicationYear	2012
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003188990 ssj0002672344
Score	1.4918895
Snippet	A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture,...
SourceID	ieee
SourceType	Publisher
StartPage	855
SubjectTerms	CUDA Graphics processing unit Kernel Optimization Parallel processing Pluto Tiles
Title	Augmentation of Programs with CUDA Streams
URI	https://ieeexplore.ieee.org/document/6280393
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ9MiLRxnPpjrAoVSKiqREHdKju2UQcS1A9-P740bRlY2GJHkewk1vOd37sHcCuzmGmlfKStklFA6CwyJqGR4Ep7Kqxlpbbq_UWMRnI6VeMa3O-0MM65knzmOnhZnuXbIltjqqzL0UpJsTrUheAbrdYun5JwkbAKqrAd_tUQSmCKJYAarmkhS10XR6umcHNb7qlq8339ze7z67iPpK-kQ9GV5JfrSgk6w-b_hnsE7b16j4x3uHQMNZefQHNr30Cq1dyCu_7647OSHuWk8PgQcrWWBJOzZPD20Cd4aB162jAZPk4GT1FlnRDNVbyKmMXIRzNnDdNhE-V9nGmZ-tQaZW2IgGyibCKlMyI10oZB-p6iyAlX1PVMzE6hkRe5OwOSaa-cDmGPZ2Gr4mNNvU6tcMywjHKXnkMLZz772hTHmFWTvvi7-xIO8cVu-K5X0Fgt1u4aDrLv1Xy5uCm_6A_wLJk9
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24Mn43Brx9YeCUogIiERDTfSn4aDw_DDv9--McCDF29rlyXttubre_2-9wHcch0yKYQLpBE88AitA6VoFKSJkC5KjWG5tuq9l_b7fDQSgxLcb7Qw1tqcfGbreJmf5ZupXmKq7CFBKyXBdmC3Ecc0XKm1NhkVmqSUFWCFbf-3-mACkywe1nBVpzxXdiVo1uRvrgs-Fe1kW4Hzofs6aCLti9Yj9CX55buSw0678r8BH0Jtq98jgw0yHUHJZsdQWRs4kGI9V-Guufz4LMRHGZk6fAjZWnOC6VnSentsEjy29j01GLafhq1OUJgnBBMRLgJmMPaRzBrFpN9GORdqyWMXGyWM8TGQocJQzq1KY8WNH6RriAhZ4SKyDRWyEyhn08yeAtHSCSt94OOY36y4UEZOxia1TDEdJTY-gyrOfPy1Ko8xLiZ9_nf3Dex3hi-9ca_bf76AA3zJK_brJZQXs6W9gj39vZjMZ9f51_0BIzGchA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+10th+International+Symposium+on+Parallel+and+Distributed+Processing+with+Applications&rft.atitle=Augmentation+of+Programs+with+CUDA+Streams&rft.au=Sharmistha&rft.au=Amilkanthwar%2C+M.&rft.au=Balachandran%2C+S.&rft.date=2012-07-01&rft.pub=IEEE&rft.isbn=1467316318&rft.issn=2158-9178&rft.spage=855&rft.epage=856&rft_id=info:doi/10.1109%2FISPA.2012.132&rft.externalDocID=6280393
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-9178&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-9178&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-9178&client=summon