Multi-modal semantic autoencoder for cross-modal retrieval

Cross-modal retrieval has gained much attention in recent years. As the research mainstream, most of existing approaches learn projections for data from different modalities into a common space where data can be compared directly. However, they neglect the preservation of feature and semantic inform...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Neurocomputing (Amsterdam) Ročník 331; s. 165 - 175
Hlavní autori:	Wu, Yiling, Wang, Shuhui, Huang, Qingming
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 28.02.2019
Predmet:	Autoencoder Cross-modal retrieval Multi-modal data Autoencoder Multi-modal data Cross-modal retrieval
ISSN:	0925-2312, 1872-8286
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Cross-modal retrieval has gained much attention in recent years. As the research mainstream, most of existing approaches learn projections for data from different modalities into a common space where data can be compared directly. However, they neglect the preservation of feature and semantic information, so they are unable to obtain satisfactory results as expected. In this paper, we propose a two-stage learning method to learn multi-modal mappings that project multi-modal data to low dimensional embeddings that preserve both feature and semantic information. In the first stage, we combine both low-level feature and high-level semantic information to learn feature-aware semantic code vectors. In the second stage, we use encoder–decoder paradigm to learn projections. The encoder projects feature vectors to code vectors, and the decoder projects code vectors back to feature vectors. The encoder-decoder paradigm guarantees the embeddings to preserve both feature and semantic information. An alternating minimization procedure is developed to solve the multi-modal semantic autoencoder optimization problem. Extensive experiments on three benchmark datasets demonstrate that the proposed method outperforms state-of-the-art cross-modal retrieval methods.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2018.11.042