Finding structure in data using multivariate tree boosting

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a u...

Full description

Saved in:
Bibliographic Details
Published in:Psychological methods Vol. 21; no. 4; p. 583
Main Authors: Miller, Patrick J, Lubke, Gitta H, McArtor, Daniel B, Bergeman, C S
Format: Journal Article
Language:English
Published: United States 01.12.2016
Subjects:
ISSN:1939-1463, 1939-1463
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause 2 or more outcome variables to covary. We provide the R package "mvtboost" to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package "gbm" (Ridgeway, 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. (PsycINFO Database Record
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1939-1463
1939-1463
DOI:10.1037/met0000087