A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

Uloženo v:
Podrobná bibliografie
Název: A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions
Autoři: Ke Zhang, Peijie Li, Jianqiang Wang
Zdroj: Remote Sensing, Vol 16, Iss 21, p 4113 (2024)
Informace o vydavateli: MDPI AG, 2024.
Rok vydání: 2024
Sbírka: LCC:Science
Témata: remote sensing, image caption, encoder–decoder framework, attention mechanism, reinforcement learning, auxiliary task, Science
Popis: Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
Druh dokumentu: article
Popis souboru: electronic resource
Jazyk: English
ISSN: 2072-4292
Relation: https://www.mdpi.com/2072-4292/16/21/4113; https://doaj.org/toc/2072-4292
DOI: 10.3390/rs16214113
Přístupová URL adresa: https://doaj.org/article/49299cde34b64a95bf5d971e2860772f
Přístupové číslo: edsdoj.49299cde34b64a95bf5d971e2860772f
Databáze: Directory of Open Access Journals
Popis
Abstrakt:Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
ISSN:20724292
DOI:10.3390/rs16214113