Learning Deep Generative Models With Doubly Stochastic Gradient MCMC

Deep generative models (DGMs), which are often organized in a hierarchical manner, provide a principled framework of capturing the underlying causal factors of data. Recent work on DGMs focussed on the development of efficient and scalable variational inference methods that learn a single model unde...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems Vol. 29; no. 7; pp. 3084 - 3096
Main Authors: Du, Chao, Zhu, Jun, Zhang, Bo
Format: Journal Article
Language:English
Published: United States IEEE 01.07.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2162-237X, 2162-2388, 2162-2388
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep generative models (DGMs), which are often organized in a hierarchical manner, provide a principled framework of capturing the underlying causal factors of data. Recent work on DGMs focussed on the development of efficient and scalable variational inference methods that learn a single model under some mean-field or parameterization assumptions. However, little work has been done on extending Markov chain Monte Carlo (MCMC) methods to Bayesian DGMs, which enjoy many advantages compared with variational methods. We present doubly stochastic gradient MCMC, a simple and generic method for (approximate) Bayesian inference of DGMs in a collapsed continuous parameter space. At each MCMC sampling step, the algorithm randomly draws a mini-batch of data samples to estimate the gradient of log-posterior and further estimates the intractable expectation over hidden variables via a neural adaptive importance sampler, where the proposal distribution is parameterized by a deep neural network and learnt jointly along with the sampling process. We demonstrate the effectiveness of learning various DGMs on a wide range of tasks, including density estimation, data generation, and missing data imputation. Our method outperforms many state-of-the-art competitors.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2017.2688499