Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes

Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affec...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE/ACM International Conference on Mining Software Repositories. Online) pp. 472 - 484
Main Authors: Ni, Chao, Xu, Xiaodan, Yang, Kaiwen, Lo, David
Format: Conference Proceeding
Language:English
Published: IEEE 01.05.2023
Subjects:
ISSN:2574-3864
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affect the quality of software projects. Meanwhile, the C programming language, one of the most popular ones, is widely used to develop foundation applications (i.e., operating system, database, compiler, etc.) in IT companies and its change-level characteristics on project quality have not been fully investigated. Additionally, whether open-source C projects have similar important features to commercial projects has not been studied much.To address the aforementioned limitations, in this paper, we investigate the impacts of programming language-specific features on the state-of-the-art JIT defect identification approach in an industrial setting. We collect and label the top-10 most starred C projects (i.e., 329,021 commits) on GitHub and 8 C projects in an ICT company (i.e., 12,983 commits). We also propose nine C-specific change-level features and focus our investigations on both open-source C projects on GitHub and C projects at the ICT company considering three aspects: (1) The effectiveness of C-specific change-level features in improving the performance of identification of defect-inducing changes, (2) The importance of features in the identification of defect-inducing changes between open-source C projects and commercial C projects, and (3) The effectiveness of combining language-independent features and C-specific features in a real-life setting at the ICT company.
AbstractList Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming language-independent change-level features. However, different programming languages have different characteristics and consequently may affect the quality of software projects. Meanwhile, the C programming language, one of the most popular ones, is widely used to develop foundation applications (i.e., operating system, database, compiler, etc.) in IT companies and its change-level characteristics on project quality have not been fully investigated. Additionally, whether open-source C projects have similar important features to commercial projects has not been studied much.To address the aforementioned limitations, in this paper, we investigate the impacts of programming language-specific features on the state-of-the-art JIT defect identification approach in an industrial setting. We collect and label the top-10 most starred C projects (i.e., 329,021 commits) on GitHub and 8 C projects in an ICT company (i.e., 12,983 commits). We also propose nine C-specific change-level features and focus our investigations on both open-source C projects on GitHub and C projects at the ICT company considering three aspects: (1) The effectiveness of C-specific change-level features in improving the performance of identification of defect-inducing changes, (2) The importance of features in the identification of defect-inducing changes between open-source C projects and commercial C projects, and (3) The effectiveness of combining language-independent features and C-specific features in a real-life setting at the ICT company.
Author Ni, Chao
Lo, David
Yang, Kaiwen
Xu, Xiaodan
Author_xml – sequence: 1
  givenname: Chao
  surname: Ni
  fullname: Ni, Chao
  email: chaoni@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 2
  givenname: Xiaodan
  surname: Xu
  fullname: Xu, Xiaodan
  email: xiaodanxu@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 3
  givenname: Kaiwen
  surname: Yang
  fullname: Yang, Kaiwen
  email: kwyang@zju.edu.cn
  organization: Zhejiang University,China
– sequence: 4
  givenname: David
  surname: Lo
  fullname: Lo, David
  email: davidlo@smu.edu.sg
  organization: Singapore Management University,Singapore
BookMark eNotjMtOwzAUBQ0CiVL6BbDwvkrrV2J7CYHyUBGIlnV161ynRiSpYleIvycIVkeaM5pzctJ2LRJyydmMc2bnz6u33DItZ4IJOWOMaXFEJlZbI3MmOTeqOCYjkWuVSVOoMzKJ8WPQpOBcczUizU3XxRTamj4dYspCm61Dg_QWPbpEX3usgkuha-lXSDu62qMLPji6QEiHHiPtPC3n5XQ6qF3dQ9P8ppbQ1geohzu0tOwqpOVuQBgvyKmHz4iT_x2T98XdunzIli_3j-X1Mguc25TlHjUUTG0NSCsLMMqLotgKzC33YJVC52RVWWDoKuGMcKAMWADvpAbn5Zhc_XUDIm72fWig_95wxrXizMgfZc5cmQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MSR59073.2023.00072
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350311846
EISSN 2574-3864
EndPage 484
ExternalDocumentID 10174108
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i119t-5fe7a604b8a3936a84f266b2e591fa944ecc3dd9a0ecd2c82ca48a9aafc37acf3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:22:07 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-5fe7a604b8a3936a84f266b2e591fa944ecc3dd9a0ecd2c82ca48a9aafc37acf3
PageCount 13
ParticipantIDs ieee_primary_10174108
PublicationCentury 2000
PublicationDate 2023-May
PublicationDateYYYYMMDD 2023-05-01
PublicationDate_xml – month: 05
  year: 2023
  text: 2023-May
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Mining Software Repositories. Online)
PublicationTitleAbbrev MSR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211714
Score 1.8303074
Snippet Just-in-time (JIT) defect prediction can identify changes as defect-inducing ones or clean ones and many approaches are proposed based on several programming...
SourceID ieee
SourceType Publisher
StartPage 472
SubjectTerms C/C++ programming language
Just-in-Time
Supervised Methods
Title Boosting Just-in-Time Defect Prediction with Specific Features of C/C++ Programming Languages in Code Changes
URI https://ieeexplore.ieee.org/document/10174108
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYmACRBHf8sBWuU1sJ7ZXAhVDqSo-pG6Va5-lDE1Q0_L7ySWhsDCwRVZkS3eRLu_83jtC7iJIvE4MZ8Zwz6TjihmQmtVgJJVK6GBDY-I6UdOpns_NrBOrN1oYAGjIZzDEx-Yu35dui62yEX4-MkZp775SaSvW2jVURA1lVCw7Z6E4MqPn15ekxn5iiCPC0agQbYB_zVBpSsj46J-HH5P-jxiPznZl5oTsQXFKVvdlWSFjmeI4LpYXDLUc9AGQnVG_jtcvGHKKfVbaDJkPuaP4w7etATYtA81G2WCAOyM_a4VbTbreZUXzgmalB9pqD6o-eR8_vmVPrJucwPI4NhuWBFA2jeRSW2FEarUMdSFeckhMHKyRsk6c8N7YCJznTnP0NrfG2uCEsi6IM9IrygLOCVVCgQnK-cQGtDXVkHDFRQ3INfdWyAvSx1gtPlpzjMV3mC7_WL8ih5iOljN4TXqb9RZuyIH73OTV-rZJ6ReMz6Jn
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA6igp5UnPjbHLyNbG2SLsnV6ZjYjaETdhtZfkAPa2Xd_PvNa-v04sFbCSWB9wqv38v3fQ-h-8glViaKEqWoJdxQQZTjkgQw0uOCSa99ZeKaivFYzmZq0ojVKy2Mc64in7kOPFZ3-bYwG2iVdeHz4TFIe_cSzmlUy7W2LRUWwIyIeeMtFEeqO3p7TQL6Yx0YEg5WhWAE_GuKSlVEBkf_PP4YtX7keHiyLTQnaMflp2j5UBQlcJYxDOQiWU5AzYEfHfAzwutwAQNBx9BpxdWYeZ8ZDL98mwCxceFxv9tvt2FnYGgtYau06V6WOMtxv7AO1-qDsoXeB0_T_pA0sxNIFsdqTRLvhO5FfCE1U6ynJfehFC-oS1TsteI8pI5Zq3TkjKVGUnA310prb5jQxrMztJsXuTtHWDDhlBfGJtqDsal0CRWUBUguqdWMX6AWxGr-UdtjzL_DdPnH-h06GE5H6Tx9Hr9coUNITc0gvEa769XG3aB987nOytVtld4vLSGlrg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Mining+Software+Repositories.+Online%29&rft.atitle=Boosting+Just-in-Time+Defect+Prediction+with+Specific+Features+of+C%2FC%2B%2B+Programming+Languages+in+Code+Changes&rft.au=Ni%2C+Chao&rft.au=Xu%2C+Xiaodan&rft.au=Yang%2C+Kaiwen&rft.au=Lo%2C+David&rft.date=2023-05-01&rft.pub=IEEE&rft.eissn=2574-3864&rft.spage=472&rft.epage=484&rft_id=info:doi/10.1109%2FMSR59073.2023.00072&rft.externalDocID=10174108