A distributed evolutionary based instance selection algorithm for big data using Apache Spark

Instance selection is an important preprocessing technology in data mining and machine learning. In this paper, we proposed a novel evolutionary based instance selection algorithm for big data. First, we defined a coarse granularity chromosome structure to reduce the size of search space and costs o...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Applied soft computing Ročník 159; s. 111638
Hlavní autoři: Qin, Liyang, Wang, Xiaoli, Yin, Linzi, Jiang, Zhaohui
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.07.2024
Témata:
ISSN:1568-4946, 1872-9681
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Instance selection is an important preprocessing technology in data mining and machine learning. In this paper, we proposed a novel evolutionary based instance selection algorithm for big data. First, we defined a coarse granularity chromosome structure to reduce the size of search space and costs of chromosome operations(recombination and mutation, etc.). Then a stratified evolution strategy was proposed to remove the hyper parameter in classic fitness function and achieve precise control over the reduction ratio of instances. Finally, a sampling-based fitness function was proposed to reduce the time complexity. Experimental results shown that our new algorithm is efficient to complete the instance selection task on data set with millions of instances in minutes-level. The 10-fold cross-validation also proved that the selection results on many datasets have high nearest neighbor classification accuracy. •We proposed a distributed coarse granularity chromosome structure to reduce search space and ensure load balance.•We proposed a stratified evolution strategy to transform multi-objective to multiple single-objective tasks.•We proposed a sampling-based fitness function with less time complexity
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2024.111638