Cross-modal Variational Alignment of Latent Spaces
In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first networ...
Gespeichert in:
| Veröffentlicht in: | IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops S. 4127 - 4136 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.06.2020
|
| Schlagworte: | |
| ISSN: | 2160-7516 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first network is a multi modal variational autoencoder that maps directly one modality to the other, while the second one is a single-modal variational autoencoder. In order to associate the two spaces, we apply variational alignment, which acts as a translation mechanism that projects the latent space of the first VAE onto the one of the single-modal VAE through an intermediate distribution. Experimental results on four well-known datasets, covering two different application domains (food image analysis and 3D hand pose estimation), show the generality of the proposed method and its superiority against a number of state-of-the-art approaches. |
|---|---|
| AbstractList | In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed approach consists of two variational autoencoder (VAE) networks which generate and model the latent space of each modality. The first network is a multi modal variational autoencoder that maps directly one modality to the other, while the second one is a single-modal variational autoencoder. In order to associate the two spaces, we apply variational alignment, which acts as a translation mechanism that projects the latent space of the first VAE onto the one of the single-modal VAE through an intermediate distribution. Experimental results on four well-known datasets, covering two different application domains (food image analysis and 3D hand pose estimation), show the generality of the proposed method and its superiority against a number of state-of-the-art approaches. |
| Author | Dimitropoulos, Kosmas Daras, Petros Solachidis, Vassilios Chatzis, Theocharis Theodoridis, Thomas |
| Author_xml | – sequence: 1 givenname: Thomas surname: Theodoridis fullname: Theodoridis, Thomas organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece – sequence: 2 givenname: Theocharis surname: Chatzis fullname: Chatzis, Theocharis organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece – sequence: 3 givenname: Vassilios surname: Solachidis fullname: Solachidis, Vassilios organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece – sequence: 4 givenname: Kosmas surname: Dimitropoulos fullname: Dimitropoulos, Kosmas organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece – sequence: 5 givenname: Petros surname: Daras fullname: Daras, Petros organization: Centre for Research and Technology Hellas,Information Technologies Institute,Thessaloniki,Greece |
| BookMark | eNotjl1LwzAUQKMouM39AhH6B1pvbnPT5HEUv6Cg6JiP4zbJJLK2o-mL_96KPp3zdDhLcdEPfRDiVkIhJdi7evf69kGgrCkQEAoAZcyZWMoKjbSlBjoXC5Qa8oqkvhLrlL4AQIIhsuVCYD0OKeXd4PmY7XiMPMWhn31zjJ99F_opGw5Zw9OvvZ_YhXQtLg98TGH9z5XYPtxv66e8eXl8rjdNHhHKKXcQpGFyLWndEkpvkZitq3j-UtpzUMFRpRGtsvNc5Ugp9q33RqPEciVu_rIxhLA_jbHj8XtvJQEZKn8AhoZGGg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CVPRW50498.2020.00488 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1728193605 9781728193601 |
| EISSN | 2160-7516 |
| EndPage | 4136 |
| ExternalDocumentID | 9150585 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-i203t-c0e18a5cb566b521d925aa9c7a19346dae4ec576229497517c544adbdd862123 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 21 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000788279004035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:30:40 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c0e18a5cb566b521d925aa9c7a19346dae4ec576229497517c544adbdd862123 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_9150585 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-June |
| PublicationDateYYYYMMDD | 2020-06-01 |
| PublicationDate_xml | – month: 06 year: 2020 text: 2020-June |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops |
| PublicationTitleAbbrev | CVPRW |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001085593 |
| Score | 1.9343189 |
| Snippet | In this paper, we propose a novel cross-modal variational alignment method in order to process and relate information across different modalities. The proposed... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 4127 |
| SubjectTerms | Decoding Gallium nitride Pose estimation Probability distribution Task analysis Three-dimensional displays Training |
| Title | Cross-modal Variational Alignment of Latent Spaces |
| URI | https://ieeexplore.ieee.org/document/9150585 |
| WOSCitedRecordID | wos000788279004035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0A8eAJFYzf6cGjld1uu6VHQyQeDCFKkBvptoMhQZbA4u93umzAgxdvTS9Np2nee9M3HYD74AmzpLq48XHCZSYUNxmJFUlQYFCh8Dorm03owaA7mZhhDR72tTCIWJrP8DEMy7d8n7ttSJV1DLEXord1qGud7mq1DvmUYLgySVWkE0em0xsP3z4UMeDg4BLBwSVDf5VfTVRKDOk3_7f6CbQPxXhsuIeZU6jh8gyaFXtk1d3ctED0AuDxr9zbBRuTBK7SfOxpMf8s3_xZPmOvxC1p9L4KVqw2jPrPo94Lrzoi8LmIkoK7COOuVS4jEpYR8HojlLXGaUs8TKbeokRHCkIII41WsXZKSusz70m4EEadQ2OZL_ECmE5dnEYOExNZwjFvug5nJp1ZRCI0kb2EVojAdLX782Jabf7q7-lrOA4h3lmobqBRrLd4C0fuu5hv1nflQf0AM42SnQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmugJFYzf9uDRym73izkaIsGIhChBbqTbDoYEWcKHv9_pugEPXrw1e9nsNJv33vRNH8Ct84RpVl0SrR_IMFWRxJTFSshQgBSRskmah00k3W5jOMReCe42szBElJvP6N4t87N8m5m1a5XVkdkL09sd2HXJWcW01raj4ixXGBRjOr6H9eag9_oeMQd2Hi7lPFyhS1j5FaOSo0ir8r_3H0JtO44nehugOYISzY6hUvBHUfydyyqopoM8-ZlZPRUDFsFFo088TCcf-am_yMaiw-ySV29zZ8aqQb_12G-2ZZGJICfKC1bSeOQ3dGRSpmEpQ69FFWmNJtHMxMLYagrJsIZQCkNMIj8xXCptU2tZujBKnUB5ls3oFEQSGz_2DAXoaS6mxYahMcZjTcSUxtNnUHUVGM1_br0YFR9__vfjG9hv9186o85T9_kCDly5fwxVl1BeLdZ0BXvmazVZLq7zTfsGCj2V5g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Cross-modal+Variational+Alignment+of+Latent+Spaces&rft.au=Theodoridis%2C+Thomas&rft.au=Chatzis%2C+Theocharis&rft.au=Solachidis%2C+Vassilios&rft.au=Dimitropoulos%2C+Kosmas&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=4127&rft.epage=4136&rft_id=info:doi/10.1109%2FCVPRW50498.2020.00488&rft.externalDocID=9150585 |