Autoencoder-based self-supervised hashing for cross-modal retrieval

Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular in this field. The crucial bottlenecks of cross-modal retrieval are twofold: the het...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Multimedia tools and applications Ročník 80; číslo 11; s. 17257 - 17274
Hlavní autori: Li, Yifan, Wang, Xuan, Cui, Lei, Zhang, Jiajia, Huang, Chengkai, Luo, Xuan, Qi, Shuhan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.05.2021
Springer Nature B.V
Predmet:
ISSN:1380-7501, 1573-7721
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular in this field. The crucial bottlenecks of cross-modal retrieval are twofold: the heterogeneous gap in different modalities and the semantic gap among similar data with various modalities. To address these issues, we adopt self-supervised fashion to bridge the heterogeneous gap by generating the cohesive features of different instances. To mitigate the semantic gap, we use triplet sampling to optimize the semantic loss in inter-modal and intra-modal, which increase the discriminability of our approach. Experimental on two benchmark datasets show the efficiency and robustness of our method, and the extended experiments show the scalability.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-09599-7