A noise-detection based AdaBoost algorithm for mislabeled data

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a n...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Pattern recognition Ročník 45; číslo 12; s. 4451 - 4465
Hlavní autoři: Cao, Jingjing, Kwong, Sam, Wang, Ran
Médium: Journal Article
Jazyk:angličtina
Vydáno: Kidlington Elsevier Ltd 01.12.2012
Elsevier
Témata:
ISSN:0031-3203, 1873-5142
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets. ► A noise-detection based loss function is proposed to distinguish mislabeled data from misclassified instances. ► A new regeneration condition is added to guarantee the generalization performance of the proposed algorithm. ► The proposed methods significantly outperform most of the state-of-the-art methods in high noise region.
Bibliografie:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2012.05.002