Autoencoder-based self-supervised hashing for cross-modal retrieval

Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular in this field. The crucial bottlenecks of cross-modal retrieval are twofold: the het...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications Vol. 80; no. 11; pp. 17257 - 17274
Main Authors: Li, Yifan, Wang, Xuan, Cui, Lei, Zhang, Jiajia, Huang, Chengkai, Luo, Xuan, Qi, Shuhan
Format: Journal Article
Language:English
Published: New York Springer US 01.05.2021
Springer Nature B.V
Subjects:
ISSN:1380-7501, 1573-7721
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cross-modal retrieval has gained lots of attention in the era of the multimedia data explosion. Taking advantage of low storage cost and fast retrieval speed, hash learning-based methods become more and more popular in this field. The crucial bottlenecks of cross-modal retrieval are twofold: the heterogeneous gap in different modalities and the semantic gap among similar data with various modalities. To address these issues, we adopt self-supervised fashion to bridge the heterogeneous gap by generating the cohesive features of different instances. To mitigate the semantic gap, we use triplet sampling to optimize the semantic loss in inter-modal and intra-modal, which increase the discriminability of our approach. Experimental on two benchmark datasets show the efficiency and robustness of our method, and the extended experiments show the scalability.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-09599-7