Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes

Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affec...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE/ACM International Conference on Mining Software Repositories. Online) pp. 472 - 484
Main Authors: Ni, Chao, Xu, Xiaodan, Yang, Kaiwen, Lo, David
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2023
Subjects:
ISSN:2574-3864
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affect the quality of software projects. Meanwhile, the C programming language, one of the most popular ones, is widely used to develop foundation applications (i.e., operating system, database, compiler, etc.) in IT companies and its change-level characteristics on project quality have not been fully investigated. Additionally, whether open-source C projects have similar important features to commercial projects has not been studied much.To address the aforementioned limitations, in this paper, we investigate the impacts of programming language-specific features on the state-of-the-art JIT defect identification approach in an industrial setting. We collect and label the top-10 most starred C projects (i.e., 329,021 commits) on GitHub and 8 C projects in an ICT company (i.e., 12,983 commits). We also propose nine C-specific change-level features and focus our investigations on both open-source C projects on GitHub and C projects at the ICT company considering three aspects: (1) The effectiveness of C-specific change-level features in improving the performance of identification of defect-inducing changes, (2) The importance of features in the identification of defect-inducing changes between open-source C projects and commercial C projects, and (3) The effectiveness of combining language-independent features and C-specific features in a real-life setting at the ICT company.
AbstractList Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affect the quality of software projects. Meanwhile, the C programming language, one of the most popular ones, is widely used to develop foundation applications (i.e., operating system, database, compiler, etc.) in IT companies and its change-level characteristics on project quality have not been fully investigated. Additionally, whether open-source C projects have similar important features to commercial projects has not been studied much.To address the aforementioned limitations, in this paper, we investigate the impacts of programming language-specific features on the state-of-the-art JIT defect identification approach in an industrial setting. We collect and label the top-10 most starred C projects (i.e., 329,021 commits) on GitHub and 8 C projects in an ICT company (i.e., 12,983 commits). We also propose nine C-specific change-level features and focus our investigations on both open-source C projects on GitHub and C projects at the ICT company considering three aspects: (1) The effectiveness of C-specific change-level features in improving the performance of identification of defect-inducing changes, (2) The importance of features in the identification of defect-inducing changes between open-source C projects and commercial C projects, and (3) The effectiveness of combining language-independent features and C-specific features in a real-life setting at the ICT company.
Author Ni, Chao
Lo, David
Yang, Kaiwen
Xu, Xiaodan
Author_xml – sequence: 1
  givenname: Chao
  surname: Ni
  fullname: Ni, Chao
  email: chaoni@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 2
  givenname: Xiaodan
  surname: Xu
  fullname: Xu, Xiaodan
  email: xiaodanxu@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 3
  givenname: Kaiwen
  surname: Yang
  fullname: Yang, Kaiwen
  email: kwyang@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 4
  givenname: David
  surname: Lo
  fullname: Lo, David
  email: davidlo@smu.edu.sg
  organization: Singapore Management University,Singapore
BookMark eNotjMtOwzAUBQ0CiVL6BbDwvkrrV2J7CYHyUBGIlnV161ynRiSpYleIvycIVkeaM5pzctJ2LRJyydmMc2bnz6u33DItZ4IJOWOMaXFEJlZbI3MmOTeqOCYjkWuVSVOoMzKJ8WPQpOBcczUizU3XxRTamj4dYspCm61Dg_QWPbpEX3usgkuha-lXSDu62qMLPji6QEiHHiPtPC3n5XQ6qF3dQ9P8ppbQ1geohzu0tOwqpOVuQBgvyKmHz4iT_x2T98XdunzIli_3j-X1Mguc25TlHjUUTG0NSCsLMMqLotgKzC33YJVC52RVWWDoKuGMcKAMWADvpAbn5Zhc_XUDIm72fWig_95wxrXizMgfZc5cmQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MSR59073.2023.00072
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350311846
EISSN 2574-3864
EndPage 484
ExternalDocumentID 10174108
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i119t-5fe7a604b8a3936a84f266b2e591fa944ecc3dd9a0ecd2c82ca48a9aafc37acf3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:22:07 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-5fe7a604b8a3936a84f266b2e591fa944ecc3dd9a0ecd2c82ca48a9aafc37acf3
PageCount 13
ParticipantIDs ieee_primary_10174108
PublicationCentury 2000
PublicationDate 2023-May
PublicationDateYYYYMMDD 2023-05-01
PublicationDate_xml – month: 05
  year: 2023
  text: 2023-May
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Mining Software Repositories. Online)
PublicationTitleAbbrev MSR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211714
Score 1.8303074
Snippet Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming...
SourceID ieee
SourceType Publisher
StartPage 472
SubjectTerms C/C++ programming language
Just-in-Time
Supervised Methods
Title Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes
URI https://ieeexplore.ieee.org/document/10174108
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmACRBHf8sBWuU1ix45XAhVDqSo-pG6VP6UMTVDT8vvxuaGwMLBFUeRI50Tnd3fvPYTuaOITblNNcpVpwjJmiQ7HVsITHfAz457rSBSeiOm0mM_lrCOrRy6Mcy4On7khXMZevm3MBkplI_h8WArU3n0h-JastSuo0ABlRMo6ZaE0kaPn15c8YD86BItwECoEGeBfHioxhYyP_vnyY9T_IePh2S7NnKA9V5-i5X3TtDCxjMGOi1Q1AS4HfnAwnREeh_YLhBxDnRVHk3lfGQwHvk0A2LjxuByVgwGsDPNZS1hq0tUuW1zVuGysw1vuQdtH7-PHt_KJdM4JpEpTuSa5d0LxhOlCUUm5KpgPiVhnLpepV5KxsHHUWqkSZ2xmigy0zZVUyhsqlPH0DPXqpnbnCDOuTK6Fps4VTGunjQ2YRHEjTfh9qbpAfYjV4mMrjrH4DtPlH_ev0GFswEc63zXqrVcbd4MOzOe6ale3cUu_ALg6o5k
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1EBT2pWPHbPXgr2ybZzSZ7tVoqpqVohd7KfkIOTaRp_f3uJLF68eAthLCBmYTZN_veG4TuaeACbkJFYhkpwiJmiPLbVsID5fEz446rWiicJZNJOp-LaStWr7Uw1tqafGZ7cFmf5ZtSb6BV1ofPh4Ug7d2LGYuCRq61balQD2aSkLXeQmEg-uO319ijP9qDIeFgVQhGwL-mqNRFZHj0z9cfo86PHA9Pt4XmBO3Y4hQtH8qyAs4yhoFcJC8IqDnwowV-hn8cDmAg6Bg6rbgeM-9yjWHLt_EQG5cOD_qDbhdWBobWEpbK2u5lhfMCD0pjcaM-qDroffg0G4xIOzuB5GEo1iR2NpE8YCqVVFAuU-Z8KVaRjUXopGDMp44aI2RgtYl0GoG7uRRSOk0TqR09Q7tFWdhzhBmXOlaJotamTCmrtPGoRHIttP-BqbxAHYjV4qOxx1h8h-nyj_t36GA0G2eL7HnycoUOITUNg_Aa7a5XG3uD9vXnOq9Wt3V6vwBowKbn
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Mining+Software+Repositories.+Online%29&rft.atitle=Boosting+Just-in-Time+Defect+Prediction+with+Specific+Features+of+C%2FC%2B%2B+Programming+Languages+in+Code+Changes&rft.au=Ni%2C+Chao&rft.au=Xu%2C+Xiaodan&rft.au=Yang%2C+Kaiwen&rft.au=Lo%2C+David&rft.date=2023-05-01&rft.pub=IEEE&rft.eissn=2574-3864&rft.spage=472&rft.epage=484&rft_id=info:doi/10.1109%2FMSR59073.2023.00072&rft.externalDocID=10174108