Unsupervised Contrastive Hashing With Autoencoder Semantic Similarity for Cross-Modal Retrieval in Remote Sensing

In large-scale multimodal remote sensing data archives, the application of cross-modal technology to achieve fast retrieval between different modalities has attracted great attention. In this article, we focus on cross-modal retrieval technology between remote sensing images and text. At present, th...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE journal of selected topics in applied earth observations and remote sensing Ročník 18; s. 6047 - 6059
Hlavní autoři: Liu, Na, Wu, Guodong, Huang, Yonggui, Chen, Xi, Li, Qingdu, Wan, Lihong
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1939-1404, 2151-1535
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In large-scale multimodal remote sensing data archives, the application of cross-modal technology to achieve fast retrieval between different modalities has attracted great attention. In this article, we focus on cross-modal retrieval technology between remote sensing images and text. At present, there is still a large heterogeneity problem in the semantic information extracted from different modal data in the remote sensing field, which leads to the inability to effectively utilize intraclass similarities and interclass differences in the hash learning process, ultimately resulting in low cross-modal retrieval accuracy. In addition, supervised learning-based methods require a large number of labeled training samples, which limits the large-scale application of hash-based cross-modal retrieval technology in the remote sensing field. To address this problem, this article proposes a new unsupervised cross-autoencoder contrast hashing method for RS retrieval. This method constructs an end-to-end deep hashing model, which mainly includes a feature extraction module and a hash representation module. The feature extraction module is mainly responsible for extracting deep semantic information from different modal data and sends the different modal semantic information to the hash representation module through the intermediate layer to learn and generate binary hash codes. In the hashing module, we introduce a new multiobjective loss function to increase the expression of intramodal and intermodal semantic consistency through multiscale semantic similarity constraints and contrastive learning and add a cross-autoencoding module to reconstruct and compare hash features to reduce the loss of semantic information during the learning process. This article conducts a large number of experiments on the UC Merced Land dataset and the RSICD dataset. The experimental results of these two popular benchmark datasets show that the proposed CACH method outperforms the most advanced unsupervised cross-modal hashing methods in RS.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2025.3538701