Cross-modality masked autoencoder for infrared and visible image fusion
highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synt...
Gespeichert in:
| Veröffentlicht in: | Pattern recognition Jg. 172; S. 112767 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.04.2026
|
| Schlagworte: | |
| ISSN: | 0031-3203 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity.
Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks. |
|---|---|
| ISSN: | 0031-3203 |
| DOI: | 10.1016/j.patcog.2025.112767 |