Cross-modality masked autoencoder for infrared and visible image fusion

highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synt...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Pattern recognition Ročník 172; s. 112767
Hlavní autoři:	Bi, Cong, Qian, Wenhua, Shao, Qiuhan, Cao, Jinde, Wang, Xue, Yan, Kaixiang
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Ltd 01.04.2026
Témata:	Cross-modality feature interaction Image fusion Infrared and visible image Masked autoencoder Transformer Cross-modality feature interaction Infrared and visible image Transformer Masked autoencoder Image fusion
ISSN:	0031-3203
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
ISSN:	0031-3203
DOI:	10.1016/j.patcog.2025.112767