An Efficient GPU Implementation of Inclusion-Based Pointer Analysis
We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the gra...
Uloženo v:
| Vydáno v: | IEEE transactions on parallel and distributed systems Ročník 27; číslo 2; s. 353 - 366 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
01.02.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the graph that represents the pointer-manipulating statements in a program. These modifications are highly irregular, input-dependent and statically unpredictable, making it much more challenging to balance such graph workloads across a multitude of GPU cores than those dealt with by traditional graph algorithms such as DFS and BFS. To parallelise Andersen's analysis efficiently on GPUs, we introduce an imbalance-aware workload partitioning scheme that divides its workload dynamically among the concurrent warps, initially in a warp-centric manner (during the coarsegrain stage) but later switches to a task-pool-based model when a workload imbalance is detected (during the fine-grain stage). We improve further its performance by using an adaptive group propagation scheme to reduce some redundant traversals. For a set of 14 C benchmarks evaluated, our parallel implementation of Andersen's analysis achieves a significant speedup of 46 percent on average over the state-of-the art on an NVIDIA Tesla K20c GPU. |
|---|---|
| AbstractList | We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the graph that represents the pointer-manipulating statements in a program. These modifications are highly irregular, input-dependent and statically unpredictable, making it much more challenging to balance such graph workloads across a multitude of GPU cores than those dealt with by traditional graph algorithms such as DFS and BFS. To parallelise Andersen's analysis efficiently on GPUs, we introduce an imbalance-aware workload partitioning scheme that divides its workload dynamically among the concurrent warps, initially in a warp-centric manner (during the coarse-grain stage) but later switches to a task-pool-based model when a workload imbalance is detected (during the fine-grain stage). We improve further its performance by using an adaptive group propagation scheme to reduce some redundant traversals. For a set of 14 C benchmarks evaluated, our parallel implementation of Andersen's analysis achieves a significant speedup of 46 percent on average over the state-of-the art on an NVIDIA Tesla K20c GPU. We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the graph that represents the pointer-manipulating statements in a program. These modifications are highly irregular, input-dependent and statically unpredictable, making it much more challenging to balance such graph workloads across a multitude of GPU cores than those dealt with by traditional graph algorithms such as DFS and BFS. To parallelise Andersen's analysis efficiently on GPUs, we introduce an imbalance-aware workload partitioning scheme that divides its workload dynamically among the concurrent warps, initially in a warp-centric manner (during the coarsegrain stage) but later switches to a task-pool-based model when a workload imbalance is detected (during the fine-grain stage). We improve further its performance by using an adaptive group propagation scheme to reduce some redundant traversals. For a set of 14 C benchmarks evaluated, our parallel implementation of Andersen's analysis achieves a significant speedup of 46 percent on average over the state-of-the art on an NVIDIA Tesla K20c GPU. |
| Author | Jingling Xue Xiang-Ke Liao Ding Ye Yu Su |
| Author_xml | – sequence: 1 givenname: Yu surname: Su fullname: Su, Yu – sequence: 2 givenname: Ding surname: Ye fullname: Ye, Ding – sequence: 3 givenname: Jingling surname: Xue fullname: Xue, Jingling – sequence: 4 givenname: Xiang-Ke surname: Liao fullname: Liao, Xiang-Ke |
| BookMark | eNp9kLFuwjAQhq2KSgXaB6i6ROrSJdQXx3E8UkopElKRCnNknLNklDg0DgNvX0egDgydfCd9__nuG5GBaxwS8gh0AkDl62b9_j1JKPBJwqSQjN2QIXCexwnkbBBqmvJYJiDvyMj7PaWQcpoOyWzqorkxVlt0XbRYb6NlfaiwDp3qbOOixkRLp6ujD038pjyW0bqxrsM2mjpVnbz19-TWqMrjw-Udk-3HfDP7jFdfi-Vsuoo1S7IuFqXRGiXbmRxKQfOSM2Y0SIYpUpQCFc-R5ZnZZVybXS5VIo2hqaBJCtIINiYv57mHtvk5ou-K2nqNVaUcNkdfQLg7EZBmENDnK3TfHNuwb09xIWTGAjwmcKZ023jfoikOra1VeyqAFr3Wotda9FqLi9aQEVcZbc-qulbZ6t_k0zlpEfHvp3CdBBDsF6K3hfk |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1109_TSE_2018_2869336 crossref_primary_10_1134_S0361768819070041 crossref_primary_10_1007_s10664_025_10720_3 |
| Cites_doi | 10.1109/CGO.2011.5764696 10.1109/SC.2010.46 10.1145/1250734.1250767 10.1145/1290520.1290524 10.1145/1926385.1926445 10.1109/CGO.2009.9 10.1145/2491894.2466483 10.1109/CGO.2011.5764694 10.1145/2442516.2442531 10.1007/978-3-642-03237-0_12 10.1145/2259016.2259050 10.1145/781131.781144 10.1145/1941553.1941590 10.1023/B:SQJO.0000039791.93071.a2 10.1145/2442516.2442523 10.1145/996841.996859 10.1007/3-540-47764-0_15 10.1109/TSE.2014.2302311 10.1109/HiPC.2012.6507474 10.1007/978-3-540-77220-0_21 10.1002/spe.2214 10.1007/BFb0053565 10.1145/1390630.1390658 10.1145/1944862.1944872 10.1145/277650.277667 10.1109/PACT.2011.14 10.1007/978-3-642-28652-0_4 10.1109/PACT.2013.6618800 10.1007/978-3-642-19861-8_5 10.1145/1455770.1455778 10.1145/231379.231389 10.1145/2025113.2025160 10.1109/IPDPS.2011.59 10.1145/1772954.1772985 10.1145/2145816.2145831 10.1145/1869459.1869495 10.1145/1926385.1926390 10.1145/2338965.2336784 10.1109/ICPP.2014.54 10.1109/IPDPS.2013.37 10.1145/2581122.2544154 10.1145/1133981.1134027 10.1145/2370816.2370866 10.1109/HiPC.2013.6799110 10.1145/2541228.2555296 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2016 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Feb 2016 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| DOI | 10.1109/TPDS.2015.2397933 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
| DatabaseTitleList | Technology Research Database Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 366 |
| ExternalDocumentID | 3924551691 10_1109_TPDS_2015_2397933 7029117 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Australian Research Council grantid: DP130101970; DP150102109 funderid: 10.13039/501100000923 – fundername: National Natural Science Foundation of China grantid: 61170049 funderid: 10.13039/501100001809 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AGSQL AHBIQ AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
| ID | FETCH-LOGICAL-c326t-7dfcce93bf81d708d533fc193e4e0e97ea58e386fb65cfb89a29ff04702419f73 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 9 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000370925200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Sun Sep 28 09:38:53 EDT 2025 Sun Nov 09 08:22:23 EST 2025 Tue Nov 18 20:47:35 EST 2025 Sat Nov 29 03:36:08 EST 2025 Wed Aug 27 02:52:21 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | pointer analysis compilers GPGPU Parallel graph algorithms |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/OAPA.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c326t-7dfcce93bf81d708d533fc193e4e0e97ea58e386fb65cfb89a29ff04702419f73 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| PQID | 1757796379 |
| PQPubID | 85437 |
| PageCount | 14 |
| ParticipantIDs | proquest_miscellaneous_1793271461 crossref_primary_10_1109_TPDS_2015_2397933 crossref_citationtrail_10_1109_TPDS_2015_2397933 ieee_primary_7029117 proquest_journals_1757796379 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-02-01 |
| PublicationDateYYYYMMDD | 2016-02-01 |
| PublicationDate_xml | – month: 02 year: 2016 text: 2016-02-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2016 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref11 ref17 ref16 ref19 ref18 ref50 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref37 ref36 ref31 ref30 ref32 ref2 ref1 ref39 ref38 (ref14) 0 fink (ref25) 0; 5673 ye (ref33) 0 ref24 ref23 ref26 ref20 ref22 ref21 ref28 ref27 ref29 fink (ref34) 0 sui (ref10) 0 |
| References_xml | – start-page: 112 year: 0 ident: ref34 article-title: Thin slicing publication-title: Proc of the ACM SIGPLAN 2003 Conf on Programming Language Design and Implementation – ident: ref12 doi: 10.1109/CGO.2011.5764696 – start-page: 1 year: 0 ident: ref10 article-title: Query-directed adaptive heap cloning for optimizing compilers publication-title: Proc Int l Symp Code Generations and Optimization – ident: ref44 doi: 10.1109/SC.2010.46 – ident: ref15 doi: 10.1145/1250734.1250767 – ident: ref5 doi: 10.1145/1290520.1290524 – ident: ref50 doi: 10.1145/1926385.1926445 – ident: ref19 doi: 10.1109/CGO.2009.9 – ident: ref32 doi: 10.1145/2491894.2466483 – ident: ref20 doi: 10.1109/CGO.2011.5764694 – ident: ref42 doi: 10.1145/2442516.2442531 – ident: ref35 doi: 10.1007/978-3-642-03237-0_12 – ident: ref30 doi: 10.1145/2259016.2259050 – ident: ref28 doi: 10.1145/781131.781144 – ident: ref27 doi: 10.1145/1941553.1941590 – ident: ref18 doi: 10.1023/B:SQJO.0000039791.93071.a2 – ident: ref26 doi: 10.1145/2442516.2442523 – ident: ref6 doi: 10.1145/996841.996859 – ident: ref1 doi: 10.1007/3-540-47764-0_15 – ident: ref38 doi: 10.1109/TSE.2014.2302311 – ident: ref46 doi: 10.1109/HiPC.2012.6507474 – ident: ref43 doi: 10.1007/978-3-540-77220-0_21 – ident: ref11 doi: 10.1002/spe.2214 – ident: ref17 doi: 10.1007/BFb0053565 – ident: ref29 doi: 10.1145/1390630.1390658 – ident: ref39 doi: 10.1145/1944862.1944872 – ident: ref16 doi: 10.1145/277650.277667 – ident: ref45 doi: 10.1109/PACT.2011.14 – ident: ref22 doi: 10.1007/978-3-642-28652-0_4 – ident: ref40 doi: 10.1109/PACT.2013.6618800 – ident: ref31 doi: 10.1007/978-3-642-19861-8_5 – ident: ref4 doi: 10.1145/1455770.1455778 – ident: ref2 doi: 10.1145/231379.231389 – volume: 5673 start-page: 205 year: 0 ident: ref25 article-title: The complexity of Andersen's analysis in practice publication-title: Proc 16th Int Static Anal Symp – ident: ref13 doi: 10.1145/2025113.2025160 – ident: ref49 doi: 10.1109/IPDPS.2011.59 – ident: ref8 doi: 10.1145/1772954.1772985 – start-page: 319 year: 0 ident: ref33 article-title: Region-based selective flow-sensitive pointer analysis publication-title: Proc Int Symp Static Analysis – ident: ref23 doi: 10.1145/2145816.2145831 – ident: ref21 doi: 10.1145/1869459.1869495 – ident: ref9 doi: 10.1145/1926385.1926390 – ident: ref37 doi: 10.1145/2338965.2336784 – ident: ref41 doi: 10.1109/ICPP.2014.54 – year: 0 ident: ref14 – ident: ref48 doi: 10.1109/IPDPS.2013.37 – ident: ref3 doi: 10.1145/2581122.2544154 – ident: ref7 doi: 10.1145/1133981.1134027 – ident: ref47 doi: 10.1145/2370816.2370866 – ident: ref24 doi: 10.1109/HiPC.2013.6799110 – ident: ref36 doi: 10.1145/2541228.2555296 |
| SSID | ssj0014504 |
| Score | 2.204676 |
| Snippet | We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based,... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 353 |
| SubjectTerms | Adaptation models Algorithm design and analysis Algorithms Balancing Benchmarks compilers Computer information security GPGPU Graphics processing units Graphs Instruction sets Optimization Parallel graph algorithms Partitioning algorithms pointer analysis Switches Synchronization Vectors Workload |
| Title | An Efficient GPU Implementation of Inclusion-Based Pointer Analysis |
| URI | https://ieeexplore.ieee.org/document/7029117 https://www.proquest.com/docview/1757796379 https://www.proquest.com/docview/1793271461 |
| Volume | 27 |
| WOSCitedRecordID | wos000370925200005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH448aAHp5vi_EUET2K2rr_SHHVOPcgYqLBbadMXEGYrc_Pv9yXLiqII3gpJ0zZf38uXvOR7AOfEgYOgyJBjrJCHfaW5ROVzE_XLfR17OVqkH8RolEwmcrwGl_VZGES0m8-way5tLL-o1MIslfWE55NtigY0hIiXZ7XqiEEY2VSBNLuIuCQzdBHMvid7T-ObR7OJK-r6JooVBN_GIJtU5YcntsPLbfN_L7YD245Gsqsl7ruwhmULmqsUDcxZbAu2vugNtmFwVbKh1Yyg5tjd-JlZceBXd_6oZJVm5DCmC7OExq9pgCvYuDKKEjO2Ui_Zg-fb4dPgnrssClwRNZtzUWilUAa5JmoqvKQggqcV8TYM0UMpMIsSDJJY53GkdJ7IzJdaeyF9U9iXWgT7sF5WJR4AIxBlRl4BhaKyXCV5QtO1IiF_r3WU-x3wVv2aKicxbjJdTFM71fBkaqBIDRSpg6IDF_Utb0t9jb8qt03f1xVdt3fgeAVe6izwPSVaJAR5FyE7cFYXk-2YgEhWYrUwdYi9CpPZ_PD3lo9gk57vdmkfw_p8tsAT2FAf85f32an9AT8BKtLYCg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ZS8QwEB68QH3wFtczgk9iNNtj0zx6K67Lgiv4Vtp0AoK2su76-51ks0VRBN8KSUOarzP5kkm-ATggDhyGRYYcWxp51NSGK9QBt1G_PDAtkaNDui07neTpSXUn4Ki-C4OI7vAZHttHF8svKj20W2UnUgRkm3ISpuMoCsTotlYdM4hilyyQ1hcxV2SIPobZFOqk1714sMe44uPAxrHC8Nss5NKq_PDFboK5Wvxf15ZgwRNJdjpCfhkmsFyBxXGSBuZtdgXmvygOrsL5ackunWoENceuu4_MyQO_-htIJasMI5fxMrSbaPyMpriCdSurKdFnY_2SNXi8uuyd33CfR4FrImcDLgujNaowN0ROpUgKonhGE3PDCAUqiVmcYJi0TN6KtckTlQXKGBHRN0VNZWS4DlNlVeIGMIJRZeQXUGoqy3WSJ7RgKxLy-MbEedAAMR7XVHuRcZvr4iV1iw2hUgtFaqFIPRQNOKxfeRspbPxVedWOfV3RD3sDtsfgpd4G31MiRlKSf5GqAft1MVmPDYlkJVZDW4f4q7S5zTd_b3kPZm969-20fdu524I56os_s70NU4P-EHdgRn8Mnt_7u-5n_ASP0ttR |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Efficient+GPU+Implementation+of+Inclusion-Based+Pointer+Analysis&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Yu+Su&rft.au=Ding+Ye&rft.au=Jingling+Xue&rft.au=Xiang-Ke+Liao&rft.date=2016-02-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=27&rft.issue=2&rft.spage=353&rft.epage=366&rft_id=info:doi/10.1109%2FTPDS.2015.2397933&rft.externalDocID=7029117 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |