Improving bug detection via context-based code representation learning and attention-based neural networks
Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the...
Uloženo v:
| Vydáno v: | Proceedings of ACM on programming languages Ročník 3; číslo OOPSLA; s. 1 - 30 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
01.10.2019
|
| ISSN: | 2475-1421, 2475-1421 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy. |
|---|---|
| AbstractList | Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy. |
| Author | Wang, Shaohua Nguyen, Tien N. Li, Yi Van Nguyen, Son |
| Author_xml | – sequence: 1 givenname: Yi surname: Li fullname: Li, Yi organization: New Jersey Institute of Technology, USA – sequence: 2 givenname: Shaohua surname: Wang fullname: Wang, Shaohua organization: New Jersey Institute of Technology, USA – sequence: 3 givenname: Tien N. surname: Nguyen fullname: Nguyen, Tien N. organization: University of Texas at Dallas, USA – sequence: 4 givenname: Son surname: Van Nguyen fullname: Van Nguyen, Son organization: University of Texas at Dallas, USA |
| BookMark | eNplUMtOwzAQtFCRKKXiF3LjFPAjTpwjqnhUqsQFztEmXlcuqVPZboG_x6U5IDjNzu7MancuycQNDgm5ZvSWsULeCVFSqdQZmfKikjkrOJv8qi_IPIQNpZTVolCinpLNcrvzw8G6ddbu15nGiF20g8sOFrJucBE_Y95CQJ2YxszjzmNAF-FH1SN4dzSD0xnEmAapPRoc7j30CeLH4N_DFTk30Aecjzgjb48Pr4vnfPXytFzcr_KOSxVzpaQxhTZ1VZWyVZXmpVY1FpUB4NDVjMlW1IkgUg1MlekPymmlW9GWhjMxI_lpb-eHEDyaprOnc6MH2zeMNsesmjGrpL_5o995uwX_9U_5DcAPbF4 |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2024_3488904 crossref_primary_10_1109_TR_2024_3354965 crossref_primary_10_1016_j_jss_2021_111108 crossref_primary_10_1016_j_infsof_2023_107304 crossref_primary_10_3390_app15179358 crossref_primary_10_1109_TSE_2021_3105556 crossref_primary_10_1145_3699711 crossref_primary_10_1109_TSE_2022_3144348 crossref_primary_10_1016_j_jss_2022_111355 crossref_primary_10_1109_ACCESS_2023_3313598 crossref_primary_10_1109_TCE_2024_3524511 crossref_primary_10_1109_TSE_2025_3579574 crossref_primary_10_1016_j_scico_2024_103166 crossref_primary_10_1109_TSE_2022_3209590 crossref_primary_10_1109_TSE_2022_3173346 crossref_primary_10_1007_s10515_023_00379_9 crossref_primary_10_1109_TSE_2023_3313881 crossref_primary_10_1145_3688834 crossref_primary_10_32604_cmc_2024_057697 crossref_primary_10_1109_TDSC_2023_3308897 crossref_primary_10_1016_j_jss_2025_112570 crossref_primary_10_3390_app13031710 crossref_primary_10_1016_j_cose_2022_102823 crossref_primary_10_1145_3654441 crossref_primary_10_1007_s10664_025_10625_1 crossref_primary_10_1016_j_jss_2021_111011 crossref_primary_10_1109_TIFS_2020_3044773 crossref_primary_10_1145_3660804 crossref_primary_10_1145_3731448 crossref_primary_10_1109_TCE_2025_3572334 crossref_primary_10_1016_j_comcom_2022_06_035 crossref_primary_10_1016_j_infsof_2024_107543 crossref_primary_10_3390_app13020825 crossref_primary_10_1109_TSUSC_2023_3248965 crossref_primary_10_1007_s10515_025_00490_z crossref_primary_10_1109_TSE_2023_3311796 crossref_primary_10_1109_TSE_2024_3503723 crossref_primary_10_1177_18724981251360514 crossref_primary_10_1109_JIOT_2025_3531512 crossref_primary_10_1002_smr_2330 crossref_primary_10_1109_ACCESS_2023_3263878 crossref_primary_10_1016_j_jss_2023_111934 crossref_primary_10_1016_j_jss_2022_111537 crossref_primary_10_1109_TSE_2023_3340267 crossref_primary_10_1145_3505247 crossref_primary_10_1016_j_cose_2024_103930 crossref_primary_10_1016_j_jisa_2022_103293 crossref_primary_10_1007_s11219_025_09709_4 crossref_primary_10_1145_3485275 crossref_primary_10_1016_j_engappai_2025_110381 crossref_primary_10_1016_j_jss_2024_112014 crossref_primary_10_1007_s10664_022_10275_7 crossref_primary_10_1007_s11219_025_09726_3 crossref_primary_10_1049_sfw2_12064 crossref_primary_10_1016_j_jss_2023_111941 crossref_primary_10_1016_j_eswa_2023_121865 crossref_primary_10_7717_peerj_cs_739 crossref_primary_10_1007_s10994_021_06078_4 crossref_primary_10_1016_j_neunet_2021_09_025 crossref_primary_10_1109_TSE_2024_3452595 |
| Cites_doi | 10.1145/1095430.1081754 10.1145/2786805.2786814 10.1145/3236024.3236068 10.1145/1095430.1081755 10.1145/2884781.2884870 10.1145/502059.502041 10.1145/3236024.3236032 10.1145/2884781.2884848 10.1145/2970276.2970341 10.1145/3196398.3196431 10.1145/2345156.2254075 10.1109/ICSM.2000.883028 10.4230/LIPIcs.SNAPL.2017.18 10.1145/2970276.2970326 10.1145/2813885.2737966 10.1145/1595696.1595767 10.1145/2939672.2939754 10.1145/2635868.2635922 10.1145/1176617.1176667 10.5555/2337223.2337322 10.1145/1831708.1831723 10.1145/1251535.1251537 10.1145/2884781.2884804 10.1145/512927.512945 10.1145/1499949.1499997 10.1145/24039.24041 10.1145/1287624.1287632 10.1145/1251535.1251536 10.1109/ICSME.2017.46 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3360588 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 30 |
| ExternalDocumentID | 10_1145_3360588 |
| GroupedDBID | AAKMM AAYFX AAYXX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS CITATION EBS GUFHI LHSKQ M~E OK1 ROL |
| ID | FETCH-LOGICAL-c258t-885ff4df97765b87d26d89e47faa2ac9115b39faaee0da1868390207db3b6f213 |
| ISICitedReferencesCount | 120 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000685204500047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2475-1421 |
| IngestDate | Tue Nov 18 22:25:52 EST 2025 Sat Nov 29 07:48:57 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | OOPSLA |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c258t-885ff4df97765b87d26d89e47faa2ac9115b39faaee0da1868390207db3b6f213 |
| OpenAccessLink | https://dl.acm.org/doi/pdf/10.1145/3360588 |
| PageCount | 30 |
| ParticipantIDs | crossref_citationtrail_10_1145_3360588 crossref_primary_10_1145_3360588 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-10-01 |
| PublicationDateYYYYMMDD | 2019-10-01 |
| PublicationDate_xml | – month: 10 year: 2019 text: 2019-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationYear | 2019 |
| References | e_1_2_2_24_1 e_1_2_2_49_1 e_1_2_2_22_1 e_1_2_2_20_1 Cun Yann Le (e_1_2_2_11_1) 1989 Nguyen Hoan Anh (e_1_2_2_32_1) 2009 e_1_2_2_43_1 e_1_2_2_45_1 Allamanis Miltiadis (e_1_2_2_2_1) 2016 e_1_2_2_26_1 e_1_2_2_47_1 (e_1_2_2_40_1) 2019 Vaswani Ashish (e_1_2_2_44_1) 2017 e_1_2_2_13_1 e_1_2_2_38_1 Mikolov Tomas (e_1_2_2_27_1) 2013 e_1_2_2_51_1 e_1_2_2_19_1 Bhatia Sahil (e_1_2_2_6_1) 2016 e_1_2_2_17_1 e_1_2_2_34_1 e_1_2_2_15_1 e_1_2_2_25_1 e_1_2_2_48_1 e_1_2_2_5_1 e_1_2_2_23_1 e_1_2_2_7_1 e_1_2_2_1_1 Alon Uri (e_1_2_2_3_1) 2018 e_1_2_2_42_1 e_1_2_2_29_1 Pradel Michael (e_1_2_2_36_1) 2018 e_1_2_2_46_1 Henkel Jordan (e_1_2_2_16_1) 2018 Kim Hyeji (e_1_2_2_21_1) 2018 Yin Wenpeng (e_1_2_2_50_1) 2015 e_1_2_2_14_1 e_1_2_2_37_1 e_1_2_2_12_1 e_1_2_2_39_1 Cho Kyunghyun (e_1_2_2_9_1) 2014 e_1_2_2_10_1 e_1_2_2_52_1 e_1_2_2_31_1 e_1_2_2_18_1 e_1_2_2_33_1 Tai Kai Sheng (e_1_2_2_41_1) 2015 e_1_2_2_35_1 Mou Lili (e_1_2_2_30_1) 2014 Amodio Matthew (e_1_2_2_4_1) 2017 Mikolov Tomas (e_1_2_2_28_1) 2013 Bielik Pavol (e_1_2_2_8_1) 2016; 48 |
| References_xml | – volume-title: Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs/1603.06129 year: 2016 ident: e_1_2_2_6_1 – volume-title: DeepBugs: A Learning Approach to Name-based Bug Detection. CoRR abs/1805.11683 year: 2018 ident: e_1_2_2_36_1 – ident: e_1_2_2_26_1 doi: 10.1145/1095430.1081754 – volume-title: Reps year: 2017 ident: e_1_2_2_4_1 – ident: e_1_2_2_31_1 doi: 10.1145/2786805.2786814 – ident: e_1_2_2_52_1 doi: 10.1145/3236024.3236068 – volume-title: Deepcode: Feedback Codes via Deep Learning. CoRR abs/1807.00801 year: 2018 ident: e_1_2_2_21_1 – volume-title: Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. year: 2009 ident: e_1_2_2_32_1 – volume-title: CoRR abs/1706.03762 year: 2017 ident: e_1_2_2_44_1 – volume: 48 volume-title: Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.) year: 2016 ident: e_1_2_2_8_1 – ident: e_1_2_2_24_1 doi: 10.1145/1095430.1081755 – ident: e_1_2_2_25_1 doi: 10.1145/2884781.2884870 – ident: e_1_2_2_45_1 – volume-title: 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS’13) year: 2013 ident: e_1_2_2_28_1 – ident: e_1_2_2_12_1 doi: 10.1145/502059.502041 – ident: e_1_2_2_7_1 doi: 10.1145/3236024.3236032 – ident: e_1_2_2_37_1 doi: 10.1145/2884781.2884848 – ident: e_1_2_2_46_1 doi: 10.1145/2970276.2970341 – volume-title: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078 year: 2014 ident: e_1_2_2_9_1 – ident: e_1_2_2_43_1 doi: 10.1145/3196398.3196431 – ident: e_1_2_2_19_1 doi: 10.1145/2345156.2254075 – ident: e_1_2_2_29_1 doi: 10.1109/ICSM.2000.883028 – ident: e_1_2_2_42_1 doi: 10.4230/LIPIcs.SNAPL.2017.18 – ident: e_1_2_2_49_1 doi: 10.1145/2970276.2970326 – ident: e_1_2_2_34_1 doi: 10.1145/2813885.2737966 – ident: e_1_2_2_22_1 – volume-title: Sutton year: 2016 ident: e_1_2_2_2_1 – ident: e_1_2_2_33_1 doi: 10.1145/1595696.1595767 – ident: e_1_2_2_14_1 doi: 10.1145/2939672.2939754 – ident: e_1_2_2_38_1 doi: 10.1145/2635868.2635922 – volume-title: Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 year: 2013 ident: e_1_2_2_27_1 – ident: e_1_2_2_10_1 doi: 10.1145/1176617.1176667 – ident: e_1_2_2_17_1 doi: 10.5555/2337223.2337322 – ident: e_1_2_2_39_1 – ident: e_1_2_2_1_1 – ident: e_1_2_2_15_1 doi: 10.1145/1831708.1831723 – ident: e_1_2_2_18_1 doi: 10.1145/1251535.1251537 – volume-title: d.]. Soot Introduction. https://sable.github.io/soot/ . ([n. d.]). Last Accessed year: 2019 ident: e_1_2_2_40_1 – volume-title: ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. CoRR abs/1512.05193 year: 2015 ident: e_1_2_2_50_1 – volume-title: Manning year: 2015 ident: e_1_2_2_41_1 – ident: e_1_2_2_47_1 doi: 10.1145/2884781.2884804 – ident: e_1_2_2_35_1 – ident: e_1_2_2_20_1 doi: 10.1145/512927.512945 – volume-title: TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 year: 2014 ident: e_1_2_2_30_1 – ident: e_1_2_2_51_1 doi: 10.1145/1499949.1499997 – ident: e_1_2_2_13_1 doi: 10.1145/24039.24041 – ident: e_1_2_2_48_1 doi: 10.1145/1287624.1287632 – volume-title: Reps year: 2018 ident: e_1_2_2_16_1 – volume-title: code2vec: Learning Distributed Representations of Code. CoRR abs/1803.09473 year: 2018 ident: e_1_2_2_3_1 – volume-title: Hinton year: 1989 ident: e_1_2_2_11_1 – ident: e_1_2_2_5_1 doi: 10.1145/1251535.1251536 – ident: e_1_2_2_23_1 doi: 10.1109/ICSME.2017.46 |
| SSID | ssj0001934839 |
| Score | 2.5289009 |
| Snippet | Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development... |
| SourceID | crossref |
| SourceType | Enrichment Source Index Database |
| StartPage | 1 |
| Title | Improving bug detection via context-based code representation learning and attention-based neural networks |
| Volume | 3 |
| WOSCitedRecordID | wos000685204500047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LbxMxELZC4cCFN6LlIR8QF7QlWduxfYwqEIc2jdSAyilar71tUNlGbRK1F_4Qf5KZ9WO3AQk4cFnteu1VlPk0n2c8D0JeA0kbCzyXVWqoMuDrfqat4pnrN9W7FFOON80m5Hisjo_1pNf7EXNh1meyrtXVlV78V1HDGAgbU2f_QdzpozAA9yB0uILY4fpXgm_dBGZ18ta6pfPdwNfzoglMR1MXuQvT2Sw2TVm0GUh17CLhExex9mYTDRkWYPFLEGntQ8cvuxvbSSLCJjZktHeAxxAh-usbfjB6RtMmfr8JJPgyb336wXV9WpyfrhJdjE9W1143TrElwXg3vvgMqql9eRTwFTwYA51i4YKiy7kU2YD7TOld95uxoKlZB5CHh5Oj_VFH9Q46HO6Pen5lB46FNBjDo2DVEmA89N_gxRSt6HO3xSwsvEVu51JojB88-N5x6GkG0EZzK_12n6SNa9-FtZ3dT2cbM31A7gX7g448bh6SnqsfkfuxtwcNqv4x-ZpgRAFGNMGIAozoDRhRhBG9CSMaYUQBRnQDRtTDiEYYPSGfPryf7n3MQl-OrMyFWmZKiaritgLTYSiMkjYfWqUdl1VR5EUJ9CkM0_DgXN8W2I-BabBKpDXMDKt8wJ6Srfq8ds8IhdW8tFgaSxquK16UxhlpnIX5gpVim7yJ_9esDEXrsXfK2WxDKNuEpokLX6dlc8rOn6c8J3dbgL4gW8uLlXtJ7pTr5fzy4lUj75_7RIor |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+bug+detection+via+context-based+code+representation+learning+and+attention-based+neural+networks&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Li%2C+Yi&rft.au=Wang%2C+Shaohua&rft.au=Nguyen%2C+Tien+N.&rft.au=Van+Nguyen%2C+Son&rft.date=2019-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=3&rft.issue=OOPSLA&rft.spage=1&rft.epage=30&rft_id=info:doi/10.1145%2F3360588&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3360588 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |