Unbiased variable importance for random forests

The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationa...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Communications in statistics. Theory and methods Ročník 51; číslo 5; s. 1413 - 1425
Hlavný autor: Loecher, Markus
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Taylor & Francis 04.03.2022
Predmet:
ISSN:0361-0926, 1532-415X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.
AbstractList The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples.
Author Loecher, Markus
Author_xml – sequence: 1
  givenname: Markus
  orcidid: 0000-0002-6823-1994
  surname: Loecher
  fullname: Loecher, Markus
  organization: Department of Business and Economics, Berlin School of Economics and Law
BookMark eNqFkMlKA0EQhhuJYBJ9BGFeYGL1NgtelOAGAS8GvDWVXqBlpjt0D0re3hkSLx70UlUU_1fw1YLMQgyWkGsKKwoN3ACvKLSsWjFg46quBAh2RuZUclYKKt9nZD5lyil0QRY5fwBQWTd8Tm62YecxW1N8YvK462zh-31MAwZtCxdTkTCY2E-jzUO-JOcOu2yvTn1Jto8Pb-vncvP69LK-35SaUxhKibWoQTTQcENBt2OpbCNMI7E1YLFiFiq3A42SOmZbY0VtnORW1mNcUr4kt8e7OsWck3VK-wEHH8OQ0HeKgprc1Y-7mtzVyX2k5S96n3yP6fAvd3fkfBh9e_yKqTNqwEMXkxsfoX1W_O8T32c9cZQ
CitedBy_id crossref_primary_10_1016_j_jag_2023_103589
crossref_primary_10_1029_2020WR028862
crossref_primary_10_1016_j_foreco_2024_122334
crossref_primary_10_3390_rs15112915
crossref_primary_10_3389_feduc_2022_1007779
crossref_primary_10_3390_rs15153780
crossref_primary_10_3389_fpls_2022_1051348
crossref_primary_10_1007_s10182_023_00479_7
crossref_primary_10_58564_IJSER_4_3_2025_320
crossref_primary_10_1016_j_jenvman_2025_125640
crossref_primary_10_1029_2024WR037997
crossref_primary_10_1016_j_aei_2025_103717
crossref_primary_10_59400_sv1682
crossref_primary_10_1080_03610918_2022_2154798
crossref_primary_10_1080_03610926_2020_1764042
crossref_primary_10_7717_peerj_cs_2445
crossref_primary_10_1016_j_foreco_2024_121732
crossref_primary_10_1007_s10791_025_09614_1
crossref_primary_10_1093_biomet_asac017
crossref_primary_10_1007_s12012_024_09843_8
crossref_primary_10_3390_pr11102982
crossref_primary_10_1061_JCEMD4_COENG_12848
crossref_primary_10_3390_f15122162
crossref_primary_10_1111_ddi_13682
crossref_primary_10_1016_j_watres_2023_120876
crossref_primary_10_1186_s13040_024_00354_4
crossref_primary_10_1016_j_telpol_2024_102816
crossref_primary_10_1007_s11222_021_10057_z
crossref_primary_10_3390_ijerph20010422
crossref_primary_10_1038_s41598_025_07521_5
crossref_primary_10_5194_ascmo_9_121_2023
crossref_primary_10_3390_su16219396
crossref_primary_10_1073_pnas_2118636119
crossref_primary_10_1016_j_csda_2022_107689
Cites_doi 10.1016/S0167-9473(03)00036-7
10.1023/A:1010933404324
10.1016/S0167-9473(03)00064-1
10.1016/j.csda.2006.12.030
10.1093/bioinformatics/bty373
10.1198/tast.2009.08199
10.1198/016214501753168271
10.1186/1471-2105-8-25
10.1198/106186006X133933
10.1186/1471-2105-10-213
10.1198/106186008X344522
10.1186/1471-2105-7-3
10.1080/03610926.2020.1764042
10.1186/1471-2105-5-132
10.1002/widm.1301
ContentType Journal Article
Copyright 2020 Taylor & Francis Group, LLC 2020
Copyright_xml – notice: 2020 Taylor & Francis Group, LLC 2020
DBID AAYXX
CITATION
DOI 10.1080/03610926.2020.1764042
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
EISSN 1532-415X
EndPage 1425
ExternalDocumentID 10_1080_03610926_2020_1764042
1764042
Genre Research Article
GroupedDBID -~X
.7F
.QJ
0BK
0R~
29F
2DF
30N
4.4
5GY
5VS
8VB
AAENE
AAGDL
AAHIA
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
ABCCY
ABEHJ
ABFIM
ABHAV
ABJNI
ABLIJ
ABPAQ
ABPEM
ABTAI
ABXUL
ABXYU
ACGEJ
ACGFS
ACIWK
ACTIO
ADCVX
ADGTB
ADXPE
AEISY
AEOZL
AEPSL
AEYOC
AFKVX
AFRVT
AGDLA
AGMYJ
AIJEM
AIYEW
AJWEG
AKBVH
AKOOK
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AMVHM
AQRUH
AQTUD
AVBZW
AWYRJ
BLEHA
CCCUG
CE4
CS3
DGEBU
DKSSO
EBS
E~A
E~B
F5P
GTTXZ
H13
HF~
HZ~
H~P
IPNFZ
J.P
K1G
KYCEM
LJTGL
M4Z
NA5
NY~
O9-
QWB
RIG
RNANH
ROSJB
RTWRZ
S-T
SNACF
TASJS
TBQAZ
TDBHL
TEJ
TFL
TFT
TFW
TN5
TOXWX
TTHFI
TUROJ
TWF
TWZ
UPT
UT5
UU3
WH7
ZGOLN
ZL0
~02
~S~
AAYXX
CITATION
ID FETCH-LOGICAL-c310t-5a747048083d10c9d106e84d85a9d0ea62e06fb0ca51f2e9de47df53e5710c513
IEDL.DBID TFW
ISICitedReferencesCount 44
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000534899100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0361-0926
IngestDate Tue Nov 18 20:12:43 EST 2025
Sat Nov 29 01:54:09 EST 2025
Mon Oct 20 23:47:38 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c310t-5a747048083d10c9d106e84d85a9d0ea62e06fb0ca51f2e9de47df53e5710c513
ORCID 0000-0002-6823-1994
PageCount 13
ParticipantIDs crossref_primary_10_1080_03610926_2020_1764042
informaworld_taylorfrancis_310_1080_03610926_2020_1764042
crossref_citationtrail_10_1080_03610926_2020_1764042
PublicationCentury 2000
PublicationDate 2022-03-04
PublicationDateYYYYMMDD 2022-03-04
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-03-04
  day: 04
PublicationDecade 2020
PublicationTitle Communications in statistics. Theory and methods
PublicationYear 2022
Publisher Taylor & Francis
Publisher_xml – name: Taylor & Francis
References Loh W.-Y. (CIT0010) 1997
CIT0021
CIT0020
CIT0001
CIT0012
CIT0011
CIT0022
Pedregosa F. (CIT0013) 2011; 12
CIT0003
CIT0014
CIT0002
CIT0005
Liaw A. (CIT0008) 2002; 2
CIT0016
CIT0004
CIT0015
CIT0007
CIT0018
CIT0006
CIT0017
CIT0009
CIT0019
References_xml – ident: CIT0017
  doi: 10.1016/S0167-9473(03)00036-7
– ident: CIT0001
  doi: 10.1023/A:1010933404324
– volume: 2
  start-page: 18
  issue: 3
  year: 2002
  ident: CIT0008
  publication-title: R News
– start-page: 815
  year: 1997
  ident: CIT0010
  publication-title: Statistica Sinica
– ident: CIT0020
– ident: CIT0016
  doi: 10.1016/S0167-9473(03)00064-1
– ident: CIT0021
– ident: CIT0019
  doi: 10.1016/j.csda.2006.12.030
– ident: CIT0012
  doi: 10.1093/bioinformatics/bty373
– ident: CIT0004
  doi: 10.1198/tast.2009.08199
– ident: CIT0006
  doi: 10.1198/016214501753168271
– ident: CIT0018
  doi: 10.1186/1471-2105-8-25
– ident: CIT0005
  doi: 10.1198/106186006X133933
– ident: CIT0011
  doi: 10.1186/1471-2105-10-213
– ident: CIT0015
  doi: 10.1198/106186008X344522
– ident: CIT0003
  doi: 10.1186/1471-2105-7-3
– ident: CIT0009
  doi: 10.1080/03610926.2020.1764042
– ident: CIT0002
  doi: 10.1186/1471-2105-5-132
– ident: CIT0007
– ident: CIT0022
– ident: CIT0014
  doi: 10.1002/widm.1301
– volume: 12
  start-page: 2825
  year: 2011
  ident: CIT0013
  publication-title: Journal of Machine Learning Research
SSID ssj0015783
Score 2.5123684
Snippet The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting...
SourceID crossref
informaworld
SourceType Enrichment Source
Index Database
Publisher
StartPage 1413
SubjectTerms Gini impurity
random forests
trees
Variable importance
Title Unbiased variable importance for random forests
URI https://www.tandfonline.com/doi/abs/10.1080/03610926.2020.1764042
Volume 51
WOSCitedRecordID wos000534899100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAWR
  databaseName: Taylor & Francis Journals Complete
  customDbUrl:
  eissn: 1532-415X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0015783
  issn: 0361-0926
  databaseCode: TFW
  dateStart: 19760101
  isFulltext: true
  titleUrlDefault: https://www.tandfonline.com
  providerName: Taylor & Francis
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxVAGHgVEeSkDq2n8jDMiRMUAFUMrukWOH1IlmqKm9PdzzqOiAzDAEkWRzrLOZ3938d13CN2kHrxU6i1OFHOYO2qwpj7HEnxtrQF_NKkKhZ-S0UhNp-lLk01YNmmVIYb2NVFEdVaHza3zss2IG8ChS-KUhgQDCp8SycHy4BQG6A9bczx83dwjgD3WDZIlBM0g0tbwfDfKFjptcZd-QZ3hwT_M9xDtNy5ndFfbyBHacUUP7T1v-FrLHuoGn7OmbD5Gg0mRzwDcbLSGQDqUVkWzeeWmg4FEMNcI8M0u5uEVMKU8QZPhw_j-ETd9FbABZ26FhYYYItSSK2ZJbFJ4SKe4VUKnNnZaUhdLn8dGC-KpS63jifWCOQHuiBGEnaJOsSjcGYq4I4QZq2ysFWdeKhg6Z1KIwAgpZN5HvNVnZhrS8dD74i0jLTdpo5wsKCdrlNNHtxux95p14zeB9OtiZavqd4eve5Nk7EfZ8z_IXqAuDfUQISmNX6LOavnhrtCuWcOaLa8rW_wEwWvW4g
linkProvider Taylor & Francis
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8MwDLZgIDEOPAaI8eyBa1mbV9MjQkxDbDttYrcqbRIJiW1oG_v9OH1M2wE4wKWqKjmKHDf-nNifAe5iiyiVWO1HkhqfGZL5itjUF4i1lUL_o8K8ULgb9ftyNIrXa2FcWqWLoW1BFJHv1e7ndofRVUpcC3fdMIiJyzAg-CkSDE1vG3Y4-lrHnz9ov65uEtAiixbJAsNmlKmqeL4bZsM_bbCXrvmd9uF_zPgIDkrU6T0UZnIMW2bSgP3eirJ13oC6g50Fa_MJtIaT9A39m_aWGEu76irvbZwjdbQRDyfroYvT07F7RbcyP4Vh-2nw2PHL1gp-hnhu4XOFYYQrJ5dUh0EW40MYybTkKtaBUYKYQNg0yBQPLTGxNizSllPDEZFkPKRnUJtMJ-YcPGbCkGZa6kBJRq2QOHRKBeeOFJKLtAmsUmiSlbzjrv3FexJW9KSlchKnnKRUThPuV2IfBfHGbwLx-moli_zEwxbtSRL6o-zFH2RvYa8z6HWT7nP_5RLqxJVHuBw1dgW1xezTXMNutsT1m93khvkFgnTbDA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8MwDLZgIDQOPAaI8eyBa6FJmrQ9IqACMaYdNrFblTaJNIk9tI39fpw-pu0AHOBSRZUcRY4bf07tzwA3kUGUSo1yg5Bp19c0cyU1qSsQa0uJ_keSvFC4FbTbYb8fdcpswlmZVmljaFMQReRntf24J8pUGXF3eOgSL6I2wYDiq0D4aHmbsIXQWVgj78bvyx8JaJBFh2SBUTPKVEU8302z5p7WyEtX3E68_w8LPoC9EnM694WRHMKGHjVg921J2DprQN2CzoKz-QjueqN0gN5NOQuMpG1tlTMY5jgdLcTBtTro4NR4aIfoVGbH0Iufug_PbtlYwc0Qzc1dLjGIsMXkIVPEyyJ8CB36KuQyUp6WgmpPmNTLJCeG6khpP1CGM80Rj2ScsBOojcYjfQqOrwlhmQqVJ0OfGRHi1CkTnFtKSC7SJviVPpOsZB23zS8-ElKRk5bKSaxyklI5Tbhdik0K2o3fBKLVzUrm-X2HKZqTJOxH2bM_yF7DTucxTlov7ddzqFNbG2ET1PwLqM2nn_oStrMFbt_0KjfLL_pF2b4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unbiased+variable+importance+for+random+forests&rft.jtitle=Communications+in+statistics.+Theory+and+methods&rft.au=Loecher%2C+Markus&rft.date=2022-03-04&rft.issn=0361-0926&rft.eissn=1532-415X&rft.volume=51&rft.issue=5&rft.spage=1413&rft.epage=1425&rft_id=info:doi/10.1080%2F03610926.2020.1764042&rft.externalDBID=n%2Fa&rft.externalDocID=10_1080_03610926_2020_1764042
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0361-0926&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0361-0926&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0361-0926&client=summon