Cross-modal Variational Alignment of Latent Spaces

In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first networ...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops S. 4127 - 4136
Hauptverfasser: Theodoridis, Thomas, Chatzis, Theocharis, Solachidis, Vassilios, Dimitropoulos, Kosmas, Daras, Petros
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.06.2020
Schlagworte:
ISSN:2160-7516
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first network is a multi modal variational autoencoder that maps directly one modality to the other, while the second one is a single-modal variational autoencoder. In order to associate the two spaces, we apply variational alignment, which acts as a translation mechanism that projects the latent space of the first VAE onto the one of the single-modal VAE through an intermediate distribution. Experimental results on four well-known datasets, covering two different application domains (food image analysis and 3D hand pose estimation), show the generality of the proposed method and its superiority against a number of state-of-the-art approaches.
AbstractList In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first network is a multi modal variational autoencoder that maps directly one modality to the other, while the second one is a single-modal variational autoencoder. In order to associate the two spaces, we apply variational alignment, which acts as a translation mechanism that projects the latent space of the first VAE onto the one of the single-modal VAE through an intermediate distribution. Experimental results on four well-known datasets, covering two different application domains (food image analysis and 3D hand pose estimation), show the generality of the proposed method and its superiority against a number of state-of-the-art approaches.
Author Dimitropoulos, Kosmas
Daras, Petros
Solachidis, Vassilios
Chatzis, Theocharis
Theodoridis, Thomas
Author_xml – sequence: 1
  givenname: Thomas
  surname: Theodoridis
  fullname: Theodoridis, Thomas
  organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece
– sequence: 2
  givenname: Theocharis
  surname: Chatzis
  fullname: Chatzis, Theocharis
  organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece
– sequence: 3
  givenname: Vassilios
  surname: Solachidis
  fullname: Solachidis, Vassilios
  organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece
– sequence: 4
  givenname: Kosmas
  surname: Dimitropoulos
  fullname: Dimitropoulos, Kosmas
  organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece
– sequence: 5
  givenname: Petros
  surname: Daras
  fullname: Daras, Petros
  organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece
BookMark eNotjl1LwzAUQKMouM39AhH6B1pvbnPT5HEUv6Cg6JiP4zbJJLK2o-mL_96KPp3zdDhLcdEPfRDiVkIhJdi7evf69kGgrCkQEAoAZcyZWMoKjbSlBjoXC5Qa8oqkvhLrlL4AQIIhsuVCYD0OKeXd4PmY7XiMPMWhn31zjJ99F_opGw5Zw9OvvZ_YhXQtLg98TGH9z5XYPtxv66e8eXl8rjdNHhHKKXcQpGFyLWndEkpvkZitq3j-UtpzUMFRpRGtsvNc5Ugp9q33RqPEciVu_rIxhLA_jbHj8XtvJQEZKn8AhoZGGg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CVPRW50498.2020.00488
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1728193605
9781728193601
EISSN 2160-7516
EndPage 4136
ExternalDocumentID 9150585
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i203t-c0e18a5cb566b521d925aa9c7a19346dae4ec576229497517c544adbdd862123
IEDL.DBID RIE
ISICitedReferencesCount 21
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000788279004035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:30:40 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-c0e18a5cb566b521d925aa9c7a19346dae4ec576229497517c544adbdd862123
PageCount 10
ParticipantIDs ieee_primary_9150585
PublicationCentury 2000
PublicationDate 2020-June
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-June
PublicationDecade 2020
PublicationTitle IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
PublicationTitleAbbrev CVPRW
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001085593
Score 1.9343189
Snippet In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed...
SourceID ieee
SourceType Publisher
StartPage 4127
SubjectTerms Decoding
Gallium nitride
Pose estimation
Probability distribution
Task analysis
Three-dimensional displays
Training
Title Cross-modal Variational Alignment of Latent Spaces
URI https://ieeexplore.ieee.org/document/9150585
WOSCitedRecordID wos000788279004035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0A8eAJFYzf6cGjld1uu6VHQyQeDCFKkBvptoMhQZbA4u93umzAgxdvTS9Np2nee9M3HYD74AmzpLq48XHCZSYUNxmJFUlQYFCh8Dorm03owaA7mZhhDR72tTCIWJrP8DEMy7d8n7ttSJV1DLEXord1qGud7mq1DvmUYLgySVWkE0em0xsP3z4UMeDg4BLBwSVDf5VfTVRKDOk3_7f6CbQPxXhsuIeZU6jh8gyaFXtk1d3ctED0AuDxr9zbBRuTBK7SfOxpMf8s3_xZPmOvxC1p9L4KVqw2jPrPo94Lrzoi8LmIkoK7COOuVS4jEpYR8HojlLXGaUs8TKbeokRHCkIII41WsXZKSusz70m4EEadQ2OZL_ECmE5dnEYOExNZwjFvug5nJp1ZRCI0kb2EVojAdLX782Jabf7q7-lrOA4h3lmobqBRrLd4C0fuu5hv1nflQf0AM42SnQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmugJFYzf9uDRym73izkaIsGIhChBbqTbDoYEWcKHv9_pugEPXrw1e9nsNJv33vRNH8Ct84RpVl0SrR_IMFWRxJTFSshQgBSRskmah00k3W5jOMReCe42szBElJvP6N4t87N8m5m1a5XVkdkL09sd2HXJWcW01raj4ixXGBRjOr6H9eag9_oeMQd2Hi7lPFyhS1j5FaOSo0ir8r_3H0JtO44nehugOYISzY6hUvBHUfydyyqopoM8-ZlZPRUDFsFFo088TCcf-am_yMaiw-ySV29zZ8aqQb_12G-2ZZGJICfKC1bSeOQ3dGRSpmEpQ69FFWmNJtHMxMLYagrJsIZQCkNMIj8xXCptU2tZujBKnUB5ls3oFEQSGz_2DAXoaS6mxYahMcZjTcSUxtNnUHUVGM1_br0YFR9__vfjG9hv9186o85T9_kCDly5fwxVl1BeLdZ0BXvmazVZLq7zTfsGCj2V5g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Cross-modal+Variational+Alignment+of+Latent+Spaces&rft.au=Theodoridis%2C+Thomas&rft.au=Chatzis%2C+Theocharis&rft.au=Solachidis%2C+Vassilios&rft.au=Dimitropoulos%2C+Kosmas&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=4127&rft.epage=4136&rft_id=info:doi/10.1109%2FCVPRW50498.2020.00488&rft.externalDocID=9150585