The data complexity index to construct an efficient cross-validation method

Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size and the number of experiment runs, to implement a validated evaluation. This study develops an ef...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Decision Support Systems Ročník 50; číslo 1; s. 93 - 102
Hlavní autori: Li, Der-Chiang, Fang, Yao-Hwei, Fang, Y.M. Frank
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Amsterdam Elsevier B.V 01.12.2010
Elsevier
Elsevier Sequoia S.A
Predmet:
ISSN:0167-9236, 1873-5797
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size and the number of experiment runs, to implement a validated evaluation. This study develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index, called the CBE index, by exploring the geometric structure and noise of data. The CBE index is used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with computationally expensive classification data sets. A simulated and three real data sets are employed to validate the performance of the proposed method in the study, while the validation methods compared are repeated random sub-sampling validation and K-fold cross-validation. The results show that CBE cross-validation, repeated random sub-sampling validation and K-fold cross-validation have similar validation performance, except that the training time required for CBE cross-validation is indeed lower than that for the other two methods.
Bibliografia:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0167-9236
1873-5797
DOI:10.1016/j.dss.2010.07.005