Inferring feature importance with uncertainties with application to large genotype data

Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generatin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PLoS computational biology Ročník 19; číslo 3; s. e1010963
Hlavní autoři: Johnsen, Pål Vegard, Strümke, Inga, Langaas, Mette, DeWan, Andrew Thomas, Riemer-Sørensen, Signe
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Public Library of Science 01.03.2023
Public Library of Science (PLoS)
Témata:
ISSN:1553-7358, 1553-734X, 1553-7358
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.
Bibliografie:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors declare that they have no competing interests.
ISSN:1553-7358
1553-734X
1553-7358
DOI:10.1371/journal.pcbi.1010963