MRFuse: Metric learning and masked autoencoder for fusing real infrared and visible images

The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Optics and laser technology Ročník 189; s. 112971
Hlavní autoři: Li, YuBin, Zhan, Weida, Guo, Jinxin, Zhu, Depeng, Jiang, Yichun, Chen, Yu, Xu, Xiaoyu, Han, Deng
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.11.2025
Témata:
ISSN:0030-3992
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning process, and uncontrollable auxiliary learning weights. To address these issues, this paper proposes a novel fusion method based on metric learning and masked autoencoders for real infrared and visible image fusion, termed MRFuse. MRFuse operates through a combination of metric mapping space, auxiliary networks, and fusion networks. First, we introduce a Real Degradation Estimation Module (RDEM), which employs a simple neural network to establish a controllable degradation estimation scheme within the metric space. Additionally, to train the metric space, we propose a sample generation method that provides complex training samples for the metric learning pipeline. Next, we present a fusion network based on masked autoencoding. Specifically, we construct hybrid masked infrared and visible image pairs and design a U-shaped ViT encoder–decoder architecture. This architecture leverages hierarchical feature representation and layer-wise fusion to reconstruct high-quality fused images. Finally, to train the fusion network, we design a masked region loss to constrain reconstruction errors within masked regions, and further employ gradient loss, structural consistency loss, and perceptual loss to enhance the quality of the fused images. Extensive experiments demonstrate that MRFuse exhibits superior controllability and excels in suppressing noise, blur, and glare, outperforming other state-of-the-art methods in both subjective and objective evaluations. •A new controllable and interpretable method for infrared and visible image fusion.•Metric space guides the fusion process.•New hybrid mask input method increases the robustness of the fusion network.•MAE networks with masking loss constraints produce superior fusion images.
ISSN:0030-3992
DOI:10.1016/j.optlastec.2025.112971