Measuring robustness of Feature Selection techniques on software engineering datasets

Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2011 IEEE International Conference on Information Reuse and Integration s. 309 - 314
Hlavní autoři: Huanjing Wang, Khoshgoftaar, T. M., Wald, R.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.08.2011
Témata:
ISBN:9781457709647, 1457709643
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes in their input data. In this study, we investigate the robustness of six commonly used feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability on average while two different versions of ReliefF show the most stability. Results also show that making smaller changes to the datasets has less impact on the stability of feature ranking techniques applied to those datasets.
ISBN:9781457709647
1457709643
DOI:10.1109/IRI.2011.6009565