Cost-sensitive boosting algorithms: Do we really need them?
We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997–2016, we find 15 distinct cost-sensitive variants of the original algorithm; each of these has its own motivation and claims to superiority—so who should we believe? In...
Uloženo v:
| Vydáno v: | Machine learning Ročník 104; číslo 2-3; s. 359 - 384 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.09.2016
Springer Nature B.V |
| Témata: | |
| ISSN: | 0885-6125, 1573-0565 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997–2016, we find 15 distinct cost-sensitive variants of the original algorithm; each of these has its own motivation and claims to superiority—so who should we believe? In this work we critique the Boosting literature using
four
theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. Our finding is that only
three
algorithms are fully supported—and the probabilistic model view suggests that all require their outputs to be
calibrated
for best performance. Experiments on 18 datasets across 21 degrees of imbalance support the hypothesis—showing that once calibrated, they perform equivalently, and outperform all others. Our final recommendation—based on simplicity, flexibility and performance—is to use the
original
Adaboost algorithm with a shifted decision threshold and
calibrated
probability estimates. |
|---|---|
| Bibliografie: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0885-6125 1573-0565 |
| DOI: | 10.1007/s10994-016-5572-x |