Unbiased variable importance for random forests
The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationa...
Uložené v:
| Vydané v: | Communications in statistics. Theory and methods Ročník 51; číslo 5; s. 1413 - 1425 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Taylor & Francis
04.03.2022
|
| Predmet: | |
| ISSN: | 0361-0926, 1532-415X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples. |
|---|---|
| ISSN: | 0361-0926 1532-415X |
| DOI: | 10.1080/03610926.2020.1764042 |