Unbiased variable importance for random forests
The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationa...
Uložené v:
| Vydané v: | Communications in statistics. Theory and methods Ročník 51; číslo 5; s. 1413 - 1425 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Taylor & Francis
04.03.2022
|
| Predmet: | |
| ISSN: | 0361-0926, 1532-415X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples. |
|---|---|
| AbstractList | The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting criterion. While the alternative permutation importance is generally accepted as a reliable measure of variable importance, it is also computationally demanding and suffers from other shortcomings. We propose a simple solution to the misleading/untrustworthy Gini importance which can be viewed as an over-fitting problem: we compute the loss reduction on the out-of-bag instead of the in-bag training samples. |
| Author | Loecher, Markus |
| Author_xml | – sequence: 1 givenname: Markus orcidid: 0000-0002-6823-1994 surname: Loecher fullname: Loecher, Markus organization: Department of Business and Economics, Berlin School of Economics and Law |
| BookMark | eNqFkMlKA0EQhhuJYBJ9BGFeYGL1NgtelOAGAS8GvDWVXqBlpjt0D0re3hkSLx70UlUU_1fw1YLMQgyWkGsKKwoN3ACvKLSsWjFg46quBAh2RuZUclYKKt9nZD5lyil0QRY5fwBQWTd8Tm62YecxW1N8YvK462zh-31MAwZtCxdTkTCY2E-jzUO-JOcOu2yvTn1Jto8Pb-vncvP69LK-35SaUxhKibWoQTTQcENBt2OpbCNMI7E1YLFiFiq3A42SOmZbY0VtnORW1mNcUr4kt8e7OsWck3VK-wEHH8OQ0HeKgprc1Y-7mtzVyX2k5S96n3yP6fAvd3fkfBh9e_yKqTNqwEMXkxsfoX1W_O8T32c9cZQ |
| CitedBy_id | crossref_primary_10_1016_j_jag_2023_103589 crossref_primary_10_1029_2020WR028862 crossref_primary_10_1016_j_foreco_2024_122334 crossref_primary_10_3390_rs15112915 crossref_primary_10_3389_feduc_2022_1007779 crossref_primary_10_3390_rs15153780 crossref_primary_10_3389_fpls_2022_1051348 crossref_primary_10_1007_s10182_023_00479_7 crossref_primary_10_58564_IJSER_4_3_2025_320 crossref_primary_10_1016_j_jenvman_2025_125640 crossref_primary_10_1029_2024WR037997 crossref_primary_10_1016_j_aei_2025_103717 crossref_primary_10_59400_sv1682 crossref_primary_10_1080_03610918_2022_2154798 crossref_primary_10_1080_03610926_2020_1764042 crossref_primary_10_7717_peerj_cs_2445 crossref_primary_10_1016_j_foreco_2024_121732 crossref_primary_10_1007_s10791_025_09614_1 crossref_primary_10_1093_biomet_asac017 crossref_primary_10_1007_s12012_024_09843_8 crossref_primary_10_3390_pr11102982 crossref_primary_10_1061_JCEMD4_COENG_12848 crossref_primary_10_3390_f15122162 crossref_primary_10_1111_ddi_13682 crossref_primary_10_1016_j_watres_2023_120876 crossref_primary_10_1186_s13040_024_00354_4 crossref_primary_10_1016_j_telpol_2024_102816 crossref_primary_10_1007_s11222_021_10057_z crossref_primary_10_3390_ijerph20010422 crossref_primary_10_1038_s41598_025_07521_5 crossref_primary_10_5194_ascmo_9_121_2023 crossref_primary_10_3390_su16219396 crossref_primary_10_1073_pnas_2118636119 crossref_primary_10_1016_j_csda_2022_107689 |
| Cites_doi | 10.1016/S0167-9473(03)00036-7 10.1023/A:1010933404324 10.1016/S0167-9473(03)00064-1 10.1016/j.csda.2006.12.030 10.1093/bioinformatics/bty373 10.1198/tast.2009.08199 10.1198/016214501753168271 10.1186/1471-2105-8-25 10.1198/106186006X133933 10.1186/1471-2105-10-213 10.1198/106186008X344522 10.1186/1471-2105-7-3 10.1080/03610926.2020.1764042 10.1186/1471-2105-5-132 10.1002/widm.1301 |
| ContentType | Journal Article |
| Copyright | 2020 Taylor & Francis Group, LLC 2020 |
| Copyright_xml | – notice: 2020 Taylor & Francis Group, LLC 2020 |
| DBID | AAYXX CITATION |
| DOI | 10.1080/03610926.2020.1764042 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics Mathematics |
| EISSN | 1532-415X |
| EndPage | 1425 |
| ExternalDocumentID | 10_1080_03610926_2020_1764042 1764042 |
| Genre | Research Article |
| GroupedDBID | -~X .7F .QJ 0BK 0R~ 29F 2DF 30N 4.4 5GY 5VS 8VB AAENE AAGDL AAHIA AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABEHJ ABFIM ABHAV ABJNI ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACIWK ACTIO ADCVX ADGTB ADXPE AEISY AEOZL AEPSL AEYOC AFKVX AFRVT AGDLA AGMYJ AIJEM AIYEW AJWEG AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU AMVHM AQRUH AQTUD AVBZW AWYRJ BLEHA CCCUG CE4 CS3 DGEBU DKSSO EBS E~A E~B F5P GTTXZ H13 HF~ HZ~ H~P IPNFZ J.P K1G KYCEM LJTGL M4Z NA5 NY~ O9- QWB RIG RNANH ROSJB RTWRZ S-T SNACF TASJS TBQAZ TDBHL TEJ TFL TFT TFW TN5 TOXWX TTHFI TUROJ TWF TWZ UPT UT5 UU3 WH7 ZGOLN ZL0 ~02 ~S~ AAYXX CITATION |
| ID | FETCH-LOGICAL-c310t-5a747048083d10c9d106e84d85a9d0ea62e06fb0ca51f2e9de47df53e5710c513 |
| IEDL.DBID | TFW |
| ISICitedReferencesCount | 44 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000534899100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0361-0926 |
| IngestDate | Tue Nov 18 20:12:43 EST 2025 Sat Nov 29 01:54:09 EST 2025 Mon Oct 20 23:47:38 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c310t-5a747048083d10c9d106e84d85a9d0ea62e06fb0ca51f2e9de47df53e5710c513 |
| ORCID | 0000-0002-6823-1994 |
| PageCount | 13 |
| ParticipantIDs | crossref_primary_10_1080_03610926_2020_1764042 informaworld_taylorfrancis_310_1080_03610926_2020_1764042 crossref_citationtrail_10_1080_03610926_2020_1764042 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-03-04 |
| PublicationDateYYYYMMDD | 2022-03-04 |
| PublicationDate_xml | – month: 03 year: 2022 text: 2022-03-04 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Communications in statistics. Theory and methods |
| PublicationYear | 2022 |
| Publisher | Taylor & Francis |
| Publisher_xml | – name: Taylor & Francis |
| References | Loh W.-Y. (CIT0010) 1997 CIT0021 CIT0020 CIT0001 CIT0012 CIT0011 CIT0022 Pedregosa F. (CIT0013) 2011; 12 CIT0003 CIT0014 CIT0002 CIT0005 Liaw A. (CIT0008) 2002; 2 CIT0016 CIT0004 CIT0015 CIT0007 CIT0018 CIT0006 CIT0017 CIT0009 CIT0019 |
| References_xml | – ident: CIT0017 doi: 10.1016/S0167-9473(03)00036-7 – ident: CIT0001 doi: 10.1023/A:1010933404324 – volume: 2 start-page: 18 issue: 3 year: 2002 ident: CIT0008 publication-title: R News – start-page: 815 year: 1997 ident: CIT0010 publication-title: Statistica Sinica – ident: CIT0020 – ident: CIT0016 doi: 10.1016/S0167-9473(03)00064-1 – ident: CIT0021 – ident: CIT0019 doi: 10.1016/j.csda.2006.12.030 – ident: CIT0012 doi: 10.1093/bioinformatics/bty373 – ident: CIT0004 doi: 10.1198/tast.2009.08199 – ident: CIT0006 doi: 10.1198/016214501753168271 – ident: CIT0018 doi: 10.1186/1471-2105-8-25 – ident: CIT0005 doi: 10.1198/106186006X133933 – ident: CIT0011 doi: 10.1186/1471-2105-10-213 – ident: CIT0015 doi: 10.1198/106186008X344522 – ident: CIT0003 doi: 10.1186/1471-2105-7-3 – ident: CIT0009 doi: 10.1080/03610926.2020.1764042 – ident: CIT0002 doi: 10.1186/1471-2105-5-132 – ident: CIT0007 – ident: CIT0022 – ident: CIT0014 doi: 10.1002/widm.1301 – volume: 12 start-page: 2825 year: 2011 ident: CIT0013 publication-title: Journal of Machine Learning Research |
| SSID | ssj0015783 |
| Score | 2.5123684 |
| Snippet | The default variable-importance measure in random forests, Gini importance, has been shown to suffer from the bias of the underlying Gini-gain splitting... |
| SourceID | crossref informaworld |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1413 |
| SubjectTerms | Gini impurity random forests trees Variable importance |
| Title | Unbiased variable importance for random forests |
| URI | https://www.tandfonline.com/doi/abs/10.1080/03610926.2020.1764042 |
| Volume | 51 |
| WOSCitedRecordID | wos000534899100001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAWR databaseName: Taylor & Francis Journals Complete customDbUrl: eissn: 1532-415X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0015783 issn: 0361-0926 databaseCode: TFW dateStart: 19760101 isFulltext: true titleUrlDefault: https://www.tandfonline.com providerName: Taylor & Francis |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQxVAGHgVEeSkDq2n8jDMiRMUAFUMrukWOH1IlmqKm9PdzzqOiAzDAEkWRzrLOZ3938d13CN2kHrxU6i1OFHOYO2qwpj7HEnxtrQF_NKkKhZ-S0UhNp-lLk01YNmmVIYb2NVFEdVaHza3zss2IG8ChS-KUhgQDCp8SycHy4BQG6A9bczx83dwjgD3WDZIlBM0g0tbwfDfKFjptcZd-QZ3hwT_M9xDtNy5ndFfbyBHacUUP7T1v-FrLHuoGn7OmbD5Gg0mRzwDcbLSGQDqUVkWzeeWmg4FEMNcI8M0u5uEVMKU8QZPhw_j-ETd9FbABZ26FhYYYItSSK2ZJbFJ4SKe4VUKnNnZaUhdLn8dGC-KpS63jifWCOQHuiBGEnaJOsSjcGYq4I4QZq2ysFWdeKhg6Z1KIwAgpZN5HvNVnZhrS8dD74i0jLTdpo5wsKCdrlNNHtxux95p14zeB9OtiZavqd4eve5Nk7EfZ8z_IXqAuDfUQISmNX6LOavnhrtCuWcOaLa8rW_wEwWvW4g |
| linkProvider | Taylor & Francis |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8MwDLZgIDEOPAaI8eyBa1mbV9MjQkxDbDttYrcqbRIJiW1oG_v9OH1M2wE4wKWqKjmKHDf-nNifAe5iiyiVWO1HkhqfGZL5itjUF4i1lUL_o8K8ULgb9ftyNIrXa2FcWqWLoW1BFJHv1e7ndofRVUpcC3fdMIiJyzAg-CkSDE1vG3Y4-lrHnz9ov65uEtAiixbJAsNmlKmqeL4bZsM_bbCXrvmd9uF_zPgIDkrU6T0UZnIMW2bSgP3eirJ13oC6g50Fa_MJtIaT9A39m_aWGEu76irvbZwjdbQRDyfroYvT07F7RbcyP4Vh-2nw2PHL1gp-hnhu4XOFYYQrJ5dUh0EW40MYybTkKtaBUYKYQNg0yBQPLTGxNizSllPDEZFkPKRnUJtMJ-YcPGbCkGZa6kBJRq2QOHRKBeeOFJKLtAmsUmiSlbzjrv3FexJW9KSlchKnnKRUThPuV2IfBfHGbwLx-moli_zEwxbtSRL6o-zFH2RvYa8z6HWT7nP_5RLqxJVHuBw1dgW1xezTXMNutsT1m93khvkFgnTbDA |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8MwDLZgIDQOPAaI8eyBa6FJmrQ9IqACMaYdNrFblTaJNIk9tI39fpw-pu0AHOBSRZUcRY4bf07tzwA3kUGUSo1yg5Bp19c0cyU1qSsQa0uJ_keSvFC4FbTbYb8fdcpswlmZVmljaFMQReRntf24J8pUGXF3eOgSL6I2wYDiq0D4aHmbsIXQWVgj78bvyx8JaJBFh2SBUTPKVEU8302z5p7WyEtX3E68_w8LPoC9EnM694WRHMKGHjVg921J2DprQN2CzoKz-QjueqN0gN5NOQuMpG1tlTMY5jgdLcTBtTro4NR4aIfoVGbH0Iufug_PbtlYwc0Qzc1dLjGIsMXkIVPEyyJ8CB36KuQyUp6WgmpPmNTLJCeG6khpP1CGM80Rj2ScsBOojcYjfQqOrwlhmQqVJ0OfGRHi1CkTnFtKSC7SJviVPpOsZB23zS8-ElKRk5bKSaxyklI5Tbhdik0K2o3fBKLVzUrm-X2HKZqTJOxH2bM_yF7DTucxTlov7ddzqFNbG2ET1PwLqM2nn_oStrMFbt_0KjfLL_pF2b4 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unbiased+variable+importance+for+random+forests&rft.jtitle=Communications+in+statistics.+Theory+and+methods&rft.au=Loecher%2C+Markus&rft.date=2022-03-04&rft.issn=0361-0926&rft.eissn=1532-415X&rft.volume=51&rft.issue=5&rft.spage=1413&rft.epage=1425&rft_id=info:doi/10.1080%2F03610926.2020.1764042&rft.externalDBID=n%2Fa&rft.externalDocID=10_1080_03610926_2020_1764042 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0361-0926&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0361-0926&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0361-0926&client=summon |