Improving bug detection via context-based code representation learning and attention-based neural networks

Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of ACM on programming languages Ročník 3; číslo OOPSLA; s. 1 - 30
Hlavní autoři: Li, Yi, Wang, Shaohua, Nguyen, Tien N., Van Nguyen, Son
Médium: Journal Article
Jazyk:angličtina
Vydáno: 01.10.2019
ISSN:2475-1421, 2475-1421
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.
AbstractList Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method’s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.
Author Wang, Shaohua
Nguyen, Tien N.
Li, Yi
Van Nguyen, Son
Author_xml – sequence: 1
  givenname: Yi
  surname: Li
  fullname: Li, Yi
  organization: New Jersey Institute of Technology, USA
– sequence: 2
  givenname: Shaohua
  surname: Wang
  fullname: Wang, Shaohua
  organization: New Jersey Institute of Technology, USA
– sequence: 3
  givenname: Tien N.
  surname: Nguyen
  fullname: Nguyen, Tien N.
  organization: University of Texas at Dallas, USA
– sequence: 4
  givenname: Son
  surname: Van Nguyen
  fullname: Van Nguyen, Son
  organization: University of Texas at Dallas, USA
BookMark eNplUMtOwzAQtFCRKKXiF3LjFPAjTpwjqnhUqsQFztEmXlcuqVPZboG_x6U5IDjNzu7MancuycQNDgm5ZvSWsULeCVFSqdQZmfKikjkrOJv8qi_IPIQNpZTVolCinpLNcrvzw8G6ddbu15nGiF20g8sOFrJucBE_Y95CQJ2YxszjzmNAF-FH1SN4dzSD0xnEmAapPRoc7j30CeLH4N_DFTk30Aecjzgjb48Pr4vnfPXytFzcr_KOSxVzpaQxhTZ1VZWyVZXmpVY1FpUB4NDVjMlW1IkgUg1MlekPymmlW9GWhjMxI_lpb-eHEDyaprOnc6MH2zeMNsesmjGrpL_5o995uwX_9U_5DcAPbF4
CitedBy_id crossref_primary_10_1109_ACCESS_2024_3488904
crossref_primary_10_1109_TR_2024_3354965
crossref_primary_10_1016_j_jss_2021_111108
crossref_primary_10_1016_j_infsof_2023_107304
crossref_primary_10_3390_app15179358
crossref_primary_10_1109_TSE_2021_3105556
crossref_primary_10_1145_3699711
crossref_primary_10_1109_TSE_2022_3144348
crossref_primary_10_1016_j_jss_2022_111355
crossref_primary_10_1109_ACCESS_2023_3313598
crossref_primary_10_1109_TCE_2024_3524511
crossref_primary_10_1109_TSE_2025_3579574
crossref_primary_10_1016_j_scico_2024_103166
crossref_primary_10_1109_TSE_2022_3209590
crossref_primary_10_1109_TSE_2022_3173346
crossref_primary_10_1007_s10515_023_00379_9
crossref_primary_10_1109_TSE_2023_3313881
crossref_primary_10_1145_3688834
crossref_primary_10_32604_cmc_2024_057697
crossref_primary_10_1109_TDSC_2023_3308897
crossref_primary_10_1016_j_jss_2025_112570
crossref_primary_10_3390_app13031710
crossref_primary_10_1016_j_cose_2022_102823
crossref_primary_10_1145_3654441
crossref_primary_10_1007_s10664_025_10625_1
crossref_primary_10_1016_j_jss_2021_111011
crossref_primary_10_1109_TIFS_2020_3044773
crossref_primary_10_1145_3660804
crossref_primary_10_1145_3731448
crossref_primary_10_1109_TCE_2025_3572334
crossref_primary_10_1016_j_comcom_2022_06_035
crossref_primary_10_1016_j_infsof_2024_107543
crossref_primary_10_3390_app13020825
crossref_primary_10_1109_TSUSC_2023_3248965
crossref_primary_10_1007_s10515_025_00490_z
crossref_primary_10_1109_TSE_2023_3311796
crossref_primary_10_1109_TSE_2024_3503723
crossref_primary_10_1177_18724981251360514
crossref_primary_10_1109_JIOT_2025_3531512
crossref_primary_10_1002_smr_2330
crossref_primary_10_1109_ACCESS_2023_3263878
crossref_primary_10_1016_j_jss_2023_111934
crossref_primary_10_1016_j_jss_2022_111537
crossref_primary_10_1109_TSE_2023_3340267
crossref_primary_10_1145_3505247
crossref_primary_10_1016_j_cose_2024_103930
crossref_primary_10_1016_j_jisa_2022_103293
crossref_primary_10_1007_s11219_025_09709_4
crossref_primary_10_1145_3485275
crossref_primary_10_1016_j_engappai_2025_110381
crossref_primary_10_1016_j_jss_2024_112014
crossref_primary_10_1007_s10664_022_10275_7
crossref_primary_10_1007_s11219_025_09726_3
crossref_primary_10_1049_sfw2_12064
crossref_primary_10_1016_j_jss_2023_111941
crossref_primary_10_1016_j_eswa_2023_121865
crossref_primary_10_7717_peerj_cs_739
crossref_primary_10_1007_s10994_021_06078_4
crossref_primary_10_1016_j_neunet_2021_09_025
crossref_primary_10_1109_TSE_2024_3452595
Cites_doi 10.1145/1095430.1081754
10.1145/2786805.2786814
10.1145/3236024.3236068
10.1145/1095430.1081755
10.1145/2884781.2884870
10.1145/502059.502041
10.1145/3236024.3236032
10.1145/2884781.2884848
10.1145/2970276.2970341
10.1145/3196398.3196431
10.1145/2345156.2254075
10.1109/ICSM.2000.883028
10.4230/LIPIcs.SNAPL.2017.18
10.1145/2970276.2970326
10.1145/2813885.2737966
10.1145/1595696.1595767
10.1145/2939672.2939754
10.1145/2635868.2635922
10.1145/1176617.1176667
10.5555/2337223.2337322
10.1145/1831708.1831723
10.1145/1251535.1251537
10.1145/2884781.2884804
10.1145/512927.512945
10.1145/1499949.1499997
10.1145/24039.24041
10.1145/1287624.1287632
10.1145/1251535.1251536
10.1109/ICSME.2017.46
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1145/3360588
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 30
ExternalDocumentID 10_1145_3360588
GroupedDBID AAKMM
AAYFX
AAYXX
ACM
AEFXT
AEJOY
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
CITATION
EBS
GUFHI
LHSKQ
M~E
OK1
ROL
ID FETCH-LOGICAL-c258t-885ff4df97765b87d26d89e47faa2ac9115b39faaee0da1868390207db3b6f213
ISICitedReferencesCount 120
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000685204500047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Tue Nov 18 22:25:52 EST 2025
Sat Nov 29 07:48:57 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue OOPSLA
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c258t-885ff4df97765b87d26d89e47faa2ac9115b39faaee0da1868390207db3b6f213
OpenAccessLink https://dl.acm.org/doi/pdf/10.1145/3360588
PageCount 30
ParticipantIDs crossref_citationtrail_10_1145_3360588
crossref_primary_10_1145_3360588
PublicationCentury 2000
PublicationDate 2019-10-01
PublicationDateYYYYMMDD 2019-10-01
PublicationDate_xml – month: 10
  year: 2019
  text: 2019-10-01
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings of ACM on programming languages
PublicationYear 2019
References e_1_2_2_24_1
e_1_2_2_49_1
e_1_2_2_22_1
e_1_2_2_20_1
Cun Yann Le (e_1_2_2_11_1) 1989
Nguyen Hoan Anh (e_1_2_2_32_1) 2009
e_1_2_2_43_1
e_1_2_2_45_1
Allamanis Miltiadis (e_1_2_2_2_1) 2016
e_1_2_2_26_1
e_1_2_2_47_1
(e_1_2_2_40_1) 2019
Vaswani Ashish (e_1_2_2_44_1) 2017
e_1_2_2_13_1
e_1_2_2_38_1
Mikolov Tomas (e_1_2_2_27_1) 2013
e_1_2_2_51_1
e_1_2_2_19_1
Bhatia Sahil (e_1_2_2_6_1) 2016
e_1_2_2_17_1
e_1_2_2_34_1
e_1_2_2_15_1
e_1_2_2_25_1
e_1_2_2_48_1
e_1_2_2_5_1
e_1_2_2_23_1
e_1_2_2_7_1
e_1_2_2_1_1
Alon Uri (e_1_2_2_3_1) 2018
e_1_2_2_42_1
e_1_2_2_29_1
Pradel Michael (e_1_2_2_36_1) 2018
e_1_2_2_46_1
Henkel Jordan (e_1_2_2_16_1) 2018
Kim Hyeji (e_1_2_2_21_1) 2018
Yin Wenpeng (e_1_2_2_50_1) 2015
e_1_2_2_14_1
e_1_2_2_37_1
e_1_2_2_12_1
e_1_2_2_39_1
Cho Kyunghyun (e_1_2_2_9_1) 2014
e_1_2_2_10_1
e_1_2_2_52_1
e_1_2_2_31_1
e_1_2_2_18_1
e_1_2_2_33_1
Tai Kai Sheng (e_1_2_2_41_1) 2015
e_1_2_2_35_1
Mou Lili (e_1_2_2_30_1) 2014
Amodio Matthew (e_1_2_2_4_1) 2017
Mikolov Tomas (e_1_2_2_28_1) 2013
Bielik Pavol (e_1_2_2_8_1) 2016; 48
References_xml – volume-title: Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs/1603.06129
  year: 2016
  ident: e_1_2_2_6_1
– volume-title: DeepBugs: A Learning Approach to Name-based Bug Detection. CoRR abs/1805.11683
  year: 2018
  ident: e_1_2_2_36_1
– ident: e_1_2_2_26_1
  doi: 10.1145/1095430.1081754
– volume-title: Reps
  year: 2017
  ident: e_1_2_2_4_1
– ident: e_1_2_2_31_1
  doi: 10.1145/2786805.2786814
– ident: e_1_2_2_52_1
  doi: 10.1145/3236024.3236068
– volume-title: Deepcode: Feedback Codes via Deep Learning. CoRR abs/1807.00801
  year: 2018
  ident: e_1_2_2_21_1
– volume-title: Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen.
  year: 2009
  ident: e_1_2_2_32_1
– volume-title: CoRR abs/1706.03762
  year: 2017
  ident: e_1_2_2_44_1
– volume: 48
  volume-title: Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.)
  year: 2016
  ident: e_1_2_2_8_1
– ident: e_1_2_2_24_1
  doi: 10.1145/1095430.1081755
– ident: e_1_2_2_25_1
  doi: 10.1145/2884781.2884870
– ident: e_1_2_2_45_1
– volume-title: 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS’13)
  year: 2013
  ident: e_1_2_2_28_1
– ident: e_1_2_2_12_1
  doi: 10.1145/502059.502041
– ident: e_1_2_2_7_1
  doi: 10.1145/3236024.3236032
– ident: e_1_2_2_37_1
  doi: 10.1145/2884781.2884848
– ident: e_1_2_2_46_1
  doi: 10.1145/2970276.2970341
– volume-title: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs/1406.1078
  year: 2014
  ident: e_1_2_2_9_1
– ident: e_1_2_2_43_1
  doi: 10.1145/3196398.3196431
– ident: e_1_2_2_19_1
  doi: 10.1145/2345156.2254075
– ident: e_1_2_2_29_1
  doi: 10.1109/ICSM.2000.883028
– ident: e_1_2_2_42_1
  doi: 10.4230/LIPIcs.SNAPL.2017.18
– ident: e_1_2_2_49_1
  doi: 10.1145/2970276.2970326
– ident: e_1_2_2_34_1
  doi: 10.1145/2813885.2737966
– ident: e_1_2_2_22_1
– volume-title: Sutton
  year: 2016
  ident: e_1_2_2_2_1
– ident: e_1_2_2_33_1
  doi: 10.1145/1595696.1595767
– ident: e_1_2_2_14_1
  doi: 10.1145/2939672.2939754
– ident: e_1_2_2_38_1
  doi: 10.1145/2635868.2635922
– volume-title: Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546
  year: 2013
  ident: e_1_2_2_27_1
– ident: e_1_2_2_10_1
  doi: 10.1145/1176617.1176667
– ident: e_1_2_2_17_1
  doi: 10.5555/2337223.2337322
– ident: e_1_2_2_39_1
– ident: e_1_2_2_1_1
– ident: e_1_2_2_15_1
  doi: 10.1145/1831708.1831723
– ident: e_1_2_2_18_1
  doi: 10.1145/1251535.1251537
– volume-title: d.]. Soot Introduction. https://sable.github.io/soot/ . ([n. d.]). Last Accessed
  year: 2019
  ident: e_1_2_2_40_1
– volume-title: ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. CoRR abs/1512.05193
  year: 2015
  ident: e_1_2_2_50_1
– volume-title: Manning
  year: 2015
  ident: e_1_2_2_41_1
– ident: e_1_2_2_47_1
  doi: 10.1145/2884781.2884804
– ident: e_1_2_2_35_1
– ident: e_1_2_2_20_1
  doi: 10.1145/512927.512945
– volume-title: TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718
  year: 2014
  ident: e_1_2_2_30_1
– ident: e_1_2_2_51_1
  doi: 10.1145/1499949.1499997
– ident: e_1_2_2_13_1
  doi: 10.1145/24039.24041
– ident: e_1_2_2_48_1
  doi: 10.1145/1287624.1287632
– volume-title: Reps
  year: 2018
  ident: e_1_2_2_16_1
– volume-title: code2vec: Learning Distributed Representations of Code. CoRR abs/1803.09473
  year: 2018
  ident: e_1_2_2_3_1
– volume-title: Hinton
  year: 1989
  ident: e_1_2_2_11_1
– ident: e_1_2_2_5_1
  doi: 10.1145/1251535.1251536
– ident: e_1_2_2_23_1
  doi: 10.1109/ICSME.2017.46
SSID ssj0001934839
Score 2.5289009
Snippet Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 1
Title Improving bug detection via context-based code representation learning and attention-based neural networks
Volume 3
WOSCitedRecordID wos000685204500047&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LbxMxELZC4cCFN6LlIR8QF7QlWduxfYwqEIc2jdSAyilar71tUNlGbRK1F_4Qf5KZ9WO3AQk4cFnteu1VlPk0n2c8D0JeA0kbCzyXVWqoMuDrfqat4pnrN9W7FFOON80m5Hisjo_1pNf7EXNh1meyrtXVlV78V1HDGAgbU2f_QdzpozAA9yB0uILY4fpXgm_dBGZ18ta6pfPdwNfzoglMR1MXuQvT2Sw2TVm0GUh17CLhExex9mYTDRkWYPFLEGntQ8cvuxvbSSLCJjZktHeAxxAh-usbfjB6RtMmfr8JJPgyb336wXV9WpyfrhJdjE9W1143TrElwXg3vvgMqql9eRTwFTwYA51i4YKiy7kU2YD7TOld95uxoKlZB5CHh5Oj_VFH9Q46HO6Pen5lB46FNBjDo2DVEmA89N_gxRSt6HO3xSwsvEVu51JojB88-N5x6GkG0EZzK_12n6SNa9-FtZ3dT2cbM31A7gX7g448bh6SnqsfkfuxtwcNqv4x-ZpgRAFGNMGIAozoDRhRhBG9CSMaYUQBRnQDRtTDiEYYPSGfPryf7n3MQl-OrMyFWmZKiaritgLTYSiMkjYfWqUdl1VR5EUJ9CkM0_DgXN8W2I-BabBKpDXMDKt8wJ6Srfq8ds8IhdW8tFgaSxquK16UxhlpnIX5gpVim7yJ_9esDEXrsXfK2WxDKNuEpokLX6dlc8rOn6c8J3dbgL4gW8uLlXtJ7pTr5fzy4lUj75_7RIor
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+bug+detection+via+context-based+code+representation+learning+and+attention-based+neural+networks&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Li%2C+Yi&rft.au=Wang%2C+Shaohua&rft.au=Nguyen%2C+Tien+N.&rft.au=Van+Nguyen%2C+Son&rft.date=2019-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=3&rft.issue=OOPSLA&rft.spage=1&rft.epage=30&rft_id=info:doi/10.1145%2F3360588&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3360588
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon