Cross-modality masked autoencoder for infrared and visible image fusion

highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synt...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition Jg. 172; S. 112767
Hauptverfasser:	Bi, Cong, Qian, Wenhua, Shao, Qiuhan, Cao, Jinde, Wang, Xue, Yan, Kaixiang
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.04.2026
Schlagworte:	Cross-modality feature interaction Image fusion Infrared and visible image Masked autoencoder Transformer Cross-modality feature interaction Infrared and visible image Transformer Masked autoencoder Image fusion
ISSN:	0031-3203
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2025.112767