YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer

Infrared and visible image fusion is aims to generate a composite image that can simultaneously describe the salient target in the infrared image and texture details in the visible image of the same scene. Since deep learning (DL) exhibits great feature extraction ability in computer vision tasks, i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia Jg. 25; S. 5413 - 5428
Hauptverfasser: Tang, Wei, He, Fazhi, Liu, Yu
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1520-9210, 1941-0077
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Infrared and visible image fusion is aims to generate a composite image that can simultaneously describe the salient target in the infrared image and texture details in the visible image of the same scene. Since deep learning (DL) exhibits great feature extraction ability in computer vision tasks, it has also been widely employed in handling infrared and visible image fusion issue. However, the existing DL-based methods generally extract complementary information from source images through convolutional operations, which results in limited preservation of global features. To this end, we propose a novel infrared and visible image fusion method, i.e., the Y-shape dynamic Transformer (YDTR). Specifically, a dynamic Transformer module (DTRM) is designed to acquire not only the local features but also the significant context information. Furthermore, the proposed network is devised in a Y-shape to comprehensively maintain the thermal radiation information from the infrared image and scene details from the visible image. Considering the specific information provided by the source images, we design a loss function that consists of two terms to improve fusion quality: a structural similarity (SSIM) term and a spatial frequency (SF) term. Extensive experiments on mainstream datasets illustrate that the proposed method outperforms both classical and state-of-the-art approaches in both qualitative and quantitative assessments. We further extend the YDTR to address other infrared and RGB-visible images and multi-focus images without fine-tuning, and the satisfactory fusion results demonstrate that the proposed method has good generalization capability.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2022.3192661