Distributed feature selection: An application to microarray data classification

[Display omitted] •Feature selection is indispensable when dealing with microarray data.•A new method for distributing the filtering process is proposed.•The data is distributed by features and then merged in a final subset.•The method is tested on 8 microarray datasets.•The classification accuracy...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied soft computing Ročník 30; s. 136 - 150
Hlavní autoři:	Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.05.2015
Témata:	Distributed learning Feature selection Microarray data Feature selection Microarray data Distributed learning
ISSN:	1568-4946, 1872-9681
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	[Display omitted] •Feature selection is indispensable when dealing with microarray data.•A new method for distributing the filtering process is proposed.•The data is distributed by features and then merged in a final subset.•The method is tested on 8 microarray datasets.•The classification accuracy is maintained and the time considerably shortened. Feature selection is often required as a preliminary step for many pattern recognition problems. However, most of the existing algorithms only work in a centralized fashion, i.e. using the whole dataset at once. In this research a new method for distributing the feature selection process is proposed. It distributes the data by features, i.e. according to a vertical distribution, and then performs a merging procedure which updates the feature subset according to improvements in the classification accuracy. The effectiveness of our proposal is tested on microarray data, which has brought a difficult challenge for researchers due to the high number of gene expression contained and the small samples size. The results on eight microarray datasets show that the execution time is considerably shortened whereas the performance is maintained or even improved compared to the standard algorithms applied to the non-partitioned datasets.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2015.01.035