MRFuse: Metric learning and masked autoencoder for fusing real infrared and visible images
The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning...
Gespeichert in:
| Veröffentlicht in: | Optics and laser technology Jg. 189; S. 112971 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Ltd
01.11.2025
|
| Schlagworte: | |
| ISSN: | 0030-3992 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | The task of infrared and visible image fusion aims to retain the thermal targets from infrared images while preserving the details, brightness, and other important features from visible images. Current methods face challenges such as unclear fusion objectives, difficulty in interpreting the learning process, and uncontrollable auxiliary learning weights. To address these issues, this paper proposes a novel fusion method based on metric learning and masked autoencoders for real infrared and visible image fusion, termed MRFuse. MRFuse operates through a combination of metric mapping space, auxiliary networks, and fusion networks. First, we introduce a Real Degradation Estimation Module (RDEM), which employs a simple neural network to establish a controllable degradation estimation scheme within the metric space. Additionally, to train the metric space, we propose a sample generation method that provides complex training samples for the metric learning pipeline. Next, we present a fusion network based on masked autoencoding. Specifically, we construct hybrid masked infrared and visible image pairs and design a U-shaped ViT encoder–decoder architecture. This architecture leverages hierarchical feature representation and layer-wise fusion to reconstruct high-quality fused images. Finally, to train the fusion network, we design a masked region loss to constrain reconstruction errors within masked regions, and further employ gradient loss, structural consistency loss, and perceptual loss to enhance the quality of the fused images. Extensive experiments demonstrate that MRFuse exhibits superior controllability and excels in suppressing noise, blur, and glare, outperforming other state-of-the-art methods in both subjective and objective evaluations.
•A new controllable and interpretable method for infrared and visible image fusion.•Metric space guides the fusion process.•New hybrid mask input method increases the robustness of the fusion network.•MAE networks with masking loss constraints produce superior fusion images. |
|---|---|
| ISSN: | 0030-3992 |
| DOI: | 10.1016/j.optlastec.2025.112971 |