Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations

Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private...

Full description

Saved in:
Bibliographic Details
Published in:IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 1692 - 1700
Main Authors: Lee, Mihee, Pavlovic, Vladimir
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2021
Subjects:
ISSN:2160-7516
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
AbstractList Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or modalities. However, current deep multi-modal models focus on the inference of shared representations, while neglecting the important private aspects of data within individual modalities. In this paper, we introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities. We demonstrate the utility of DMVAE two image modalities of MNIST and Google Street View House Number (SVHN) datasets as well as image and text modalities from the Oxford-102 Flowers dataset. Our experiments indicate the essence of retaining the private representation as well as the private-shared disentanglement to effectively direct the information across multiple analysis-synthesis conduits.
Author Pavlovic, Vladimir
Lee, Mihee
Author_xml – sequence: 1
  givenname: Mihee
  surname: Lee
  fullname: Lee, Mihee
  email: ml1323@rutgers.edu
  organization: Rutgers University,Piscataway,NJ,USA
– sequence: 2
  givenname: Vladimir
  surname: Pavlovic
  fullname: Pavlovic, Vladimir
  email: vladimir@cs.rutgers.edu
  organization: Rutgers University,Piscataway,NJ,USA
BookMark eNotjG1LwzAUhaMouM39AhHyB1pvcps0-TjmfIGKc-r8ONL1dka6dKRV8N9b1E-Hw3meM2YnoQ3E2KWAVAiwV_P1cvWmEKxJJUiRAgijjthYaK2yzFibH7ORFBqSXAl9xqZd9wEDBEYpiyP2tIz-y_WUPL-7SBW_9h2F3oVdM5SHz6b3-7ZyDV_PFrxuIy_IxeDDjrc1LwYv9HxFh0i_Vu_b0J2z09o1HU3_c8JebxYv87ukeLy9n8-KxEvAPiGXqbIEzHVWlzrPlMISIN-iycDJykoUziiDpA1haWorh1Eh1VtDlIPDCbv4-_VEtDlEv3fxe2OVRBAafwCyuVH7
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CVPRW53098.2021.00185
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665448997
9781665448994
EISSN 2160-7516
EndPage 1700
ExternalDocumentID 9523016
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i203t-ea45bb03764fb674553b007c3840a2d9231a8583e68e3b8f9207c53efc8ee70a3
IEDL.DBID RIE
ISICitedReferencesCount 28
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000705890201084&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:23:10 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-ea45bb03764fb674553b007c3840a2d9231a8583e68e3b8f9207c53efc8ee70a3
PageCount 9
ParticipantIDs ieee_primary_9523016
PublicationCentury 2000
PublicationDate 2021-June
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-June
PublicationDecade 2020
PublicationTitle IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
PublicationTitleAbbrev CVPRW
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001085593
Score 1.9250941
Snippet Multi-modal generative models represent an important family of deep models, whose goal is to facilitate representation learning on data with multiple views or...
SourceID ieee
SourceType Publisher
StartPage 1692
SubjectTerms Computational modeling
Computer vision
Conferences
Data models
Internet
Pattern recognition
Task analysis
Title Private-Shared Disentangled Multimodal VAE for Learning of Latent Representations
URI https://ieeexplore.ieee.org/document/9523016
WOSCitedRecordID wos000705890201084&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG6AePCECsbf6cGjlW7d1vVoEOKBkEkUuZG2eyUkuBkY_v22ZQEPXrz1R5omfWn7-vp970PoXgc6AQ6CBFyGJNIqJ4pGhsRSGBblTCvwlh7x8TidzUTWQA97LgwAePAZPLqi_8vPS711obKecCHMIGmiJufJjqt1iKc4wJVgNUknoKLXn2aTj5hR4RBcYeD-HJxi8i8RFX-HDNv_m_0EdQ9kPJztr5lT1IDiDLVr7xHXe3PTQa_Z2imVAXE5mG3P89LziorFylY8z_azzOUKT58G2HqquM6susClwSM7rqjwxMNiazZSsemi9-Hgrf9CasEEsgwpqwjIKFaK2jMjMirhURwzu6u4ZvYVJ8Pc-XIyjVMGSQpMpUaEtjNmYHQKwKlk56hVlAVcIExZKBRQnQoNkZRU5U69j5rEOJdR6kvUcSs0_9rlxJjXi3P1d_M1OnYm2EGsblCrWm_hFh3p72q5Wd95Q_4A6W-ftg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4QTfSECsbf9uDRSbe223o0KME4ySSI3EjbvRES3AwM_37bsYAHL976I02TvrR9ff2-9yF0q13tQwDCcQPpOUyrxFGEpQ6XIqUsoVpBaeko6PfD8VjENXS34cIAQAk-g3tbLP_yk1yvbKisLWwI0_V30C5nzCNrttY2omIhV4JWNB2XiHZnFA8-OCXCYrg81_46WM3kXzIq5S3Sbfxv_kPU2tLxcLy5aI5QDbJj1Kj8R1ztzmUTvcULq1UGjs3CbHoeZyWzKJvOTaVk2n7miZzj0cMTNr4qrnKrTnGe4siMywo8KIGxFR8pW7bQe_dp2Ok5lWSCM_MILRyQjCtFzKnBUuUHjHNq9lWgqXnHSS-x3pwMeUjBD4GqMBWe6eQUUh0CBETSE1TP8gxOESbUEwqIDoUGJiVRidXvI6mfWqdR6jPUtCs0-VpnxZhUi3P-d_MN2u8NX6NJ9Nx_uUAH1hxrwNUlqheLFVyhPf1dzJaL69KoP2g9ov0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Private-Shared+Disentangled+Multimodal+VAE+for+Learning+of+Latent+Representations&rft.au=Lee%2C+Mihee&rft.au=Pavlovic%2C+Vladimir&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=1692&rft.epage=1700&rft_id=info:doi/10.1109%2FCVPRW53098.2021.00185&rft.externalDocID=9523016