Unbiased variable importance for random forests

The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationa...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Communications in statistics. Theory and methods Ročník 51; číslo 5; s. 1413 - 1425
Hlavní autor: Loecher, Markus
Médium: Journal Article
Jazyk:angličtina
Vydáno: Taylor & Francis 04.03.2022
Témata:
ISSN:0361-0926, 1532-415X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.
ISSN:0361-0926
1532-415X
DOI:10.1080/03610926.2020.1764042