Conditional permutation importance revisited

Background Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We ar...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	BMC bioinformatics Ročník 21; číslo 1; s. 1 - 30
Hlavní autoři:	Debeer, Dries, Strobl, Carolin
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London BioMed Central 14.07.2020 BioMed Central Ltd Springer Nature B.V BMC
Témata:	Accuracy Algorithms Bioinformatics Biomedical and Life Sciences Computational Biology/Bioinformatics Computer Appl. in Life Sciences Computer simulation Conditional permutation importance Life Sciences Machine Learning and Artificial Intelligence in Bioinformatics Methodology Methodology Article Microarrays Open source software Permutations Pharmaceutical industry Random forest Random variables Regression analysis Source code Conditional permutation importance R Random forest
ISSN:	1471-2105, 1471-2105
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Background Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal. Results By means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R. Conclusion The proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-020-03622-2