An up-to-date comparison of state-of-the-art classification algorithms

•Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.•We compare the accuracy of 11 classification algorithms pairwise and groupwise.•We examine separately the training, parameter-tuning, and testing time.•GBDT and Random Forests yield highest accuracy, outperforming SVM...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 82; s. 128 - 150
Hlavní autoři: Zhang, Chongsheng, Liu, Changchang, Zhang, Xiangliang, Almpanidis, George
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 01.10.2017
Elsevier BV
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.•We compare the accuracy of 11 classification algorithms pairwise and groupwise.•We examine separately the training, parameter-tuning, and testing time.•GBDT and Random Forests yield highest accuracy, outperforming SVM.•GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.04.003