Measuring robustness of Feature Selection techniques on software engineering datasets

Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2011 IEEE International Conference on Information Reuse and Integration S. 309 - 314
Hauptverfasser: Huanjing Wang, Khoshgoftaar, T. M., Wald, R.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.08.2011
Schlagworte:
ISBN:9781457709647, 1457709643
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes in their input data. In this study, we investigate the robustness of six commonly used feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability on average while two different versions of ReliefF show the most stability. Results also show that making smaller changes to the datasets has less impact on the stability of feature ranking techniques applied to those datasets.
AbstractList Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes in their input data. In this study, we investigate the robustness of six commonly used feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability on average while two different versions of ReliefF show the most stability. Results also show that making smaller changes to the datasets has less impact on the stability of feature ranking techniques applied to those datasets.
Author Khoshgoftaar, T. M.
Wald, R.
Huanjing Wang
Author_xml – sequence: 1
  surname: Huanjing Wang
  fullname: Huanjing Wang
  email: huanjing.wang@wku.edu
  organization: Western Kentucky Univ., Bowling Green, KY, USA
– sequence: 2
  givenname: T. M.
  surname: Khoshgoftaar
  fullname: Khoshgoftaar, T. M.
  email: khoshgof@fau.edu
  organization: Florida Atlantic Univ., Boca Raton, FL, USA
– sequence: 3
  givenname: R.
  surname: Wald
  fullname: Wald, R.
  email: rdwald@gmail.com
  organization: Florida Atlantic Univ., Boca Raton, FL, USA
BookMark eNpVkEFLxDAUhCMqqGvvgpf-gdYkbV6aoyzuurAi6HpeXtqXNbKm2qSI_96ie3EuwzDDd5gLdhL6QIxdCV4Kwc3N6mlVSi5ECZwbBeqIZUY3olZacwMgjv_lWp-xLMY3PgnASFWfs5cHwjgOPuzyobdjTIFizHuXLwjTOFD-THtqk-9Dnqh9Df5zpKkPeexd-sJpQGHnA9EvosOEkVK8ZKcO95Gyg8_YZnG3md8X68flan67LrzhqcAGSStrgHdCEwEhVqYSoEDalre2Bq6k1diAIKlrB9qRc85KNJ2TWFUzdv2H9US0_Rj8Ow7f28MX1Q9dq1Xv
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IRI.2011.6009565
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781457709661
1457709651
9781457709654
145770966X
EndPage 314
ExternalDocumentID 6009565
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-a8ae75b960d17ee6eaa39316562bc0cb46052b7a861e274f67fefffb2a9df2a33
IEDL.DBID RIE
ISBN 9781457709647
1457709643
IngestDate Wed Aug 27 02:42:43 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-a8ae75b960d17ee6eaa39316562bc0cb46052b7a861e274f67fefffb2a9df2a33
PageCount 6
ParticipantIDs ieee_primary_6009565
PublicationCentury 2000
PublicationDate 2011-Aug.
PublicationDateYYYYMMDD 2011-08-01
PublicationDate_xml – month: 08
  year: 2011
  text: 2011-Aug.
PublicationDecade 2010
PublicationTitle 2011 IEEE International Conference on Information Reuse and Integration
PublicationTitleAbbrev IRI
PublicationYear 2011
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000669254
Score 1.5190611
Snippet Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and...
SourceID ieee
SourceType Publisher
StartPage 309
SubjectTerms Analysis of variance
fault-prone program module
Indexes
Radio frequency
Robustness
Robustness of feature selection
Software
software metrics
software quality classification
Stability criteria
Title Measuring robustness of Feature Selection techniques on software engineering datasets
URI https://ieeexplore.ieee.org/document/6009565
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODRtWl2N4-zWCxoKVqlt5JkJ9DLruxD_75Jut0iePG2eRCWBPLNfJP5BqFbm1GSCs0jamgWOftfREpDEgkrtMMPB2CB0_145vO5WK3koofuulwYAAiPz-Def4ZYflaYxlNlY-YNApb2UZ9zts3V6vgUB53SOTshdyvlnHidqZ2kU9vmuzAlkePZ62yr39mu-au4SsCW6dH__uoYjfZJenjRwc8J6kE-RO8vgfNzHbgsdFPV_ibDhcXe1GtKwG-h7o07DNypt7rxHFfuOv5WbgLsBQqxfz5aQV2N0HL6uHx4itrKCdFGkjpSQgFPtXNOsgkHYKBULGOvs0O1IUb7WCjVXAk2AeeVWsYtWGs1VTKzVMXxKRrkRQ5nCCeKEs6FSYyQSQZaZpRxLdw8aZlO4RwN_Y6sP7faGOt2My7-7r5EhztOlkyu0KAuG7hGB-ar3lTlTTjQH-pOn_o
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61CnpSacW3OXh0bZp9JDmLpcW2FK3SW0l2J9DLruxD_75JdrtF8OJt8yAsCeSb-SbzDUL3OqEk5Ip5NKaJZ-x_7kkFgcc1VwY_DIA5TvdjyuZzvlqJRQc9tLkwAOAen8Gj_XSx_CSLK0uVDSJrEEThHtoPg4CSOlurZVQMeArj7rjsrZAxYpWmtqJOTZttA5VEDCavk1rBs1n1V3kVhy6j4__91wnq79L08KIFoFPUgbSH3meO9TMdOM9UVZT2LsOZxtbYq3LAb67yjTkO3Oq3mvEUF-ZC_pZmAuwkCrF9QFpAWfTRcvS8fBp7Te0EbyNI6UkugYXKuCfJkAFEIKUvfKu0Q1VMYmWjoVQxyaMhGL9UR0yD1lpRKRJNpe-foW6apXCOcCApYYzHQcxFkIASCY2Y4mae0JEK4QL17I6sP2t1jHWzGZd_d9-hw_FyNl1PJ_OXK3S0ZWjJ8Bp1y7yCG3QQf5WbIr91h_sDbQqjQQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+IEEE+International+Conference+on+Information+Reuse+and+Integration&rft.atitle=Measuring+robustness+of+Feature+Selection+techniques+on+software+engineering+datasets&rft.au=Huanjing+Wang&rft.au=Khoshgoftaar%2C+T.+M.&rft.au=Wald%2C+R.&rft.date=2011-08-01&rft.pub=IEEE&rft.isbn=9781457709647&rft.spage=309&rft.epage=314&rft_id=info:doi/10.1109%2FIRI.2011.6009565&rft.externalDocID=6009565
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781457709647/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781457709647/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781457709647/sc.gif&client=summon&freeimage=true