Vector-quantized dual-branch fusion network for robust image fusion and anomaly suppression
•Propose a Vector-Quantized Dual-Branch Fusion Framework Develop a novel architecture with discrete codebooks to eliminate ambiguous latent spaces, fundamentally suppressing the propagation of anomalies (speckle noise, sidelobes) and environmental interferences (clouds, low-light degradation) in mul...
Gespeichert in:
| Veröffentlicht in: | Information fusion Jg. 126; S. 103630 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier B.V
01.02.2026
|
| Schlagworte: | |
| ISSN: | 1566-2535 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | •Propose a Vector-Quantized Dual-Branch Fusion Framework Develop a novel architecture with discrete codebooks to eliminate ambiguous latent spaces, fundamentally suppressing the propagation of anomalies (speckle noise, sidelobes) and environmental interferences (clouds, low-light degradation) in multimodal fusion.•Innovative HRPE Attention Mechanisms for Cross-Modal Alignment We propose a simple yet effective HRPE attention algorism, which enhances the transformer’s ability by embedding both the spatial position and modality specific information.•State-of-the-Art Performance Across Diverse Scenarios Validate on three benchmark datasets (OGSOD, GH-cloud, MSRS), outperforming 7 SOTA methods in 8/11 metrics for visible-SAR fusion (OGSOD), 7/11 for cloud degraded visible-SAR fusion (GF-cloud), and top tier performance in low-light multispectral fusion (MSRS), demonstrating robustness in all-day all-weather conditions.
Multimodal image fusion (MMIF) plays a crucial role in image information processing yet faces persistent challenges in handling anomalies to achieve robust multimodal image fusion. This paper proposes a novel vector quantization based dual branch autoencoder fusion algorithm to overcome these limitations. First, we establish two VQ codebooks for global and local features, which are learned and disentangled through two network branches. Subsequently, we employed attention-based network to learn the global features, enhancing it with a novel hybrid rotary position embedding (HRPE) module. Within the CNN branch, the Convolutional Block Attention Module (CBAM) is employed to capture detailed features. Finally, the fused image is formed through decoder based on the VQ features. Extensive experiments and quantitative metrics across three benchmark datasets (OGSOD, GF-cloud, MSRS) indicate that our method outperforms state-of-the-art methods, particularly excelling in anomaly suppression and structural fidelity preservation. Overall, the proposed framework offers a robust solution for all-weather multimodal image fusion tasks, with immediate applications in agricultural image analysis, surveillance imaging, and disaster response systems requiring reliable multimodal information integration. |
|---|---|
| ISSN: | 1566-2535 |
| DOI: | 10.1016/j.inffus.2025.103630 |