MRFuse: Metric learning and masked autoencoder for fusing real infrared and visible images

The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Optics and laser technology Jg. 189; S. 112971
Hauptverfasser:	Li, YuBin, Zhan, Weida, Guo, Jinxin, Zhu, Depeng, Jiang, Yichun, Chen, Yu, Xu, Xiaoyu, Han, Deng
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.11.2025
Schlagworte:	Deep learning Image degradation Image fusion Masked autoencoder Metric space Deep learning Metric space Masked autoencoder Image degradation Image fusion
ISSN:	0030-3992
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning process, and uncontrollable auxiliary learning weights. To address these issues, this paper proposes a novel fusion method based on metric learning and masked autoencoders for real infrared and visible image fusion, termed MRFuse. MRFuse operates through a combination of metric mapping space, auxiliary networks, and fusion networks. First, we introduce a Real Degradation Estimation Module (RDEM), which employs a simple neural network to establish a controllable degradation estimation scheme within the metric space. Additionally, to train the metric space, we propose a sample generation method that provides complex training samples for the metric learning pipeline. Next, we present a fusion network based on masked autoencoding. Specifically, we construct hybrid masked infrared and visible image pairs and design a U-shaped ViT encoder–decoder architecture. This architecture leverages hierarchical feature representation and layer-wise fusion to reconstruct high-quality fused images. Finally, to train the fusion network, we design a masked region loss to constrain reconstruction errors within masked regions, and further employ gradient loss, structural consistency loss, and perceptual loss to enhance the quality of the fused images. Extensive experiments demonstrate that MRFuse exhibits superior controllability and excels in suppressing noise, blur, and glare, outperforming other state-of-the-art methods in both subjective and objective evaluations. •A new controllable and interpretable method for infrared and visible image fusion.•Metric space guides the fusion process.•New hybrid mask input method increases the robustness of the fusion network.•MAE networks with masking loss constraints produce superior fusion images.
ISSN:	0030-3992
DOI:	10.1016/j.optlastec.2025.112971