Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations

Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops s. 1692 - 1700
Hlavní autoři: Lee, Mihee, Pavlovic, Vladimir
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.06.2021
Témata:
ISSN:2160-7516
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
ISSN:2160-7516
DOI:10.1109/CVPRW53098.2021.00185