Approximate Bayesian computation with the Wasserstein distance

A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these d...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Journal of the Royal Statistical Society. Series B, Statistical methodology Ročník 81; číslo 2; s. 235 - 269
Hlavní autoři:	Bernton, Espen, Jacob, Pierre E., Gerber, Mathieu, Robert, Christian P.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Oxford Wiley 01.04.2019 Oxford University Press
Témata:	Approximate Bayesian computation Bayesian analysis Bayesian theory Biology Computation Computer simulation Data data collection Datasets equations Generative models Hilbert curve Hilbert space Likelihood‐free inference Optimal transport Property Queueing Queues Regression analysis Statistical methods Statistical models Statistics Summaries Time series time series analysis Volatility Wasserstein distance
ISSN:	1369-7412, 1467-9868
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within approximate Bayesian computation to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and we propose a new distance based on the Hilbert space filling curve. We provide a theoretical study of the method proposed, describing consistency as the threshold goes to 0 while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queuing model and a Lévy-driven stochastic volatility model.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1369-7412 1467-9868
DOI:	10.1111/rssb.12312