Mixture Models With a Prior on the Number of Components

A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components-that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference f...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of the American Statistical Association Ročník 113; číslo 521; s. 340 - 356
Hlavní autori: Miller, Jeffrey W., Harrison, Matthew T.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Taylor & Francis 02.01.2018
Taylor & Francis Group,LLC
Taylor & Francis Ltd
Predmet:
ISSN:0162-1459, 1537-274X, 1537-274X
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components-that is, to use a mixture of finite mixtures (MFM). The most commonly used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs-an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation-and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including high-dimensional gene expression data used to discriminate cancer subtypes. Supplementary materials for this article are available online.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
The authors gratefully acknowledge support from the National Science Foundation (NSF) grants DMS-1007593, DMS-1309004, and DMS-1045153, the National Institute of Mental Health (NIMH) grant R01MH102840, the Defense Advanced Research Projects Agency (DARPA) contract FA8650-11-1-715, and the National Institutes of Health (NIH) grant R01ES020619.
ISSN:0162-1459
1537-274X
1537-274X
DOI:10.1080/01621459.2016.1255636