Bibliographic Details
| Title: |
The impact of feature types, classifiers, and data balancing techniques on software vulnerability prediction models. |
| Authors: |
Maltesque, Guest Editors, Kaya, Aydin, Keceli, Ali Seydi, Catal, Cagatay, Tekinerdogan, Bedir |
| Source: |
Journal of Software: Evolution & Process; Sep2019, Vol. 31 Issue 9, pN.PAG-N.PAG, 1p |
| Subject Terms: |
RANDOM forest algorithms, PREDICTION models, SECURITY systems software, COMPUTER security vulnerabilities, COMPUTER software testing, CLASSIFICATION algorithms |
| Abstract: |
Software vulnerabilities form an increasing security risk for software systems, that might be exploited to attack and harm the system. Some of the security vulnerabilities can be detected by static analysis tools and penetration testing, but usually, these suffer from relatively high false positive rates. Software vulnerability prediction (SVP) models can be used to categorize software components into vulnerable and neutral components before the software testing phase and likewise increase the efficiency and effectiveness of the overall verification process. The performance of a vulnerability prediction model is usually affected by the adopted classification algorithm, the adopted features, and data balancing approaches. In this study, we empirically investigate the effect of these factors on the performance of SVP models. Our experiments consist of four data balancing methods, seven classification algorithms, and three feature types. The experimental results show that data balancing methods are effective for highly unbalanced datasets, text‐based features are more useful, and ensemble‐based classifiers provide mostly better results. For smaller datasets, Random Forest algorithm provides the best performance and for the larger datasets, RusboostTree achieves better performance. [ABSTRACT FROM AUTHOR] |
|
Copyright of Journal of Software: Evolution & Process is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Complementary Index |