Augmentation of Programs with CUDA Streams
A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computati...
Uloženo v:
| Vydáno v: | 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications s. 855 - 856 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.07.2012
|
| Témata: | |
| ISBN: | 1467316318, 9781467316316 |
| ISSN: | 2158-9178 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations. |
|---|---|
| AbstractList | A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations. |
| Author | Balachandran, S. Amilkanthwar, M. Sharmistha |
| Author_xml | – sequence: 1 surname: Sharmistha fullname: Sharmistha email: sharmist@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India – sequence: 2 givenname: M. surname: Amilkanthwar fullname: Amilkanthwar, M. email: madhur@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India – sequence: 3 givenname: S. surname: Balachandran fullname: Balachandran, S. email: shankar@cse.iitm.ac.in organization: Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Madras, Chennai, India |
| BookMark | eNotjE1Lw0AUAJ9Ywbbm6MlLzkLie_uR3XcMsWqhYKH1XDbubo3YRJKI-O8N6GlgGGYBs7ZrA8A1YU6EfLfebctcIImcpDiDhI1FU7BWBkmfw4JUYSQVkuwM5oK0zZiMvYRkGN4RcfKWGedwW34dT6Ed3dh0bdrFdNt3x96dhvS7Gd_S6uW-THdjHyZzBRfRfQwh-ecS9g-rffWUbZ4f11W5yRrGMZMeSbGTwdfSIesY8dVZFZWv2XtS1gv2wtpQG1VbL1lGzSQkGqaga5RLuPnbNiGEw2ffnFz_cyiExamVv_0uRD4 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISPA.2012.132 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9780769547015 076954701X |
| EndPage | 856 |
| ExternalDocumentID | 6280393 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03 |
| IEDL.DBID | RIE |
| ISBN | 1467316318 9781467316316 |
| ISSN | 2158-9178 |
| IngestDate | Wed Aug 27 04:59:00 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-3d0149a3edb3a095ff0ca84f4db9dd148d29d288eb74b8d393f591230791e5b03 |
| PageCount | 2 |
| ParticipantIDs | ieee_primary_6280393 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-July |
| PublicationDateYYYYMMDD | 2012-07-01 |
| PublicationDate_xml | – month: 07 year: 2012 text: 2012-July |
| PublicationDecade | 2010 |
| PublicationTitle | 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications |
| PublicationTitleAbbrev | ispa |
| PublicationYear | 2012 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003188990 ssj0002672344 |
| Score | 1.4918895 |
| Snippet | A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 855 |
| SubjectTerms | CUDA Graphics processing unit Kernel Optimization Parallel processing Pluto Tiles |
| Title | Augmentation of Programs with CUDA Streams |
| URI | https://ieeexplore.ieee.org/document/6280393 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ9MiLRxnPpjrAoVSKiqREHdKju2UQcS1A9-P740bRlY2GJHkewk1vOd37sHcCuzmGmlfKStklFA6CwyJqGR4Ep7Kqxlpbbq_UWMRnI6VeMa3O-0MM65knzmOnhZnuXbIltjqqzL0UpJsTrUheAbrdYun5JwkbAKqrAd_tUQSmCKJYAarmkhS10XR6umcHNb7qlq8339ze7z67iPpK-kQ9GV5JfrSgk6w-b_hnsE7b16j4x3uHQMNZefQHNr30Cq1dyCu_7647OSHuWk8PgQcrWWBJOzZPD20Cd4aB162jAZPk4GT1FlnRDNVbyKmMXIRzNnDdNhE-V9nGmZ-tQaZW2IgGyibCKlMyI10oZB-p6iyAlX1PVMzE6hkRe5OwOSaa-cDmGPZ2Gr4mNNvU6tcMywjHKXnkMLZz772hTHmFWTvvi7-xIO8cVu-K5X0Fgt1u4aDrLv1Xy5uCm_6A_wLJk9 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24Mn43Brx9YeCUogIiERDTfSn4aDw_DDv9--McCDF29rlyXttubre_2-9wHcch0yKYQLpBE88AitA6VoFKSJkC5KjWG5tuq9l_b7fDQSgxLcb7Qw1tqcfGbreJmf5ZupXmKq7CFBKyXBdmC3Ecc0XKm1NhkVmqSUFWCFbf-3-mACkywe1nBVpzxXdiVo1uRvrgs-Fe1kW4Hzofs6aCLti9Yj9CX55buSw0678r8BH0Jtq98jgw0yHUHJZsdQWRs4kGI9V-Guufz4LMRHGZk6fAjZWnOC6VnSentsEjy29j01GLafhq1OUJgnBBMRLgJmMPaRzBrFpN9GORdqyWMXGyWM8TGQocJQzq1KY8WNH6RriAhZ4SKyDRWyEyhn08yeAtHSCSt94OOY36y4UEZOxia1TDEdJTY-gyrOfPy1Ko8xLiZ9_nf3Dex3hi-9ca_bf76AA3zJK_brJZQXs6W9gj39vZjMZ9f51_0BIzGchA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+10th+International+Symposium+on+Parallel+and+Distributed+Processing+with+Applications&rft.atitle=Augmentation+of+Programs+with+CUDA+Streams&rft.au=Sharmistha&rft.au=Amilkanthwar%2C+M.&rft.au=Balachandran%2C+S.&rft.date=2012-07-01&rft.pub=IEEE&rft.isbn=1467316318&rft.issn=2158-9178&rft.spage=855&rft.epage=856&rft_id=info:doi/10.1109%2FISPA.2012.132&rft.externalDocID=6280393 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-9178&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-9178&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-9178&client=summon |

