Vector-quantized dual-branch fusion network for robust image fusion and anomaly suppression

•Propose a Vector-Quantized Dual-Branch Fusion Framework Develop a novel architecture with discrete codebooks to eliminate ambiguous latent spaces, fundamentally suppressing the propagation of anomalies (speckle noise, sidelobes) and environmental interferences (clouds, low-light degradation) in mul...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information fusion Ročník 126; s. 103630
Hlavní autoři: Huang, Siyang, Su, Shaojing, Wei, Junyu, Hu, Liushun, Cheng, Zhangjunjie
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.02.2026
Témata:
ISSN:1566-2535
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Propose a Vector-Quantized Dual-Branch Fusion Framework Develop a novel architecture with discrete codebooks to eliminate ambiguous latent spaces, fundamentally suppressing the propagation of anomalies (speckle noise, sidelobes) and environmental interferences (clouds, low-light degradation) in multimodal fusion.•Innovative HRPE Attention Mechanisms for Cross-Modal Alignment We propose a simple yet effective HRPE attention algorism, which enhances the transformer’s ability by embedding both the spatial position and modality specific information.•State-of-the-Art Performance Across Diverse Scenarios Validate on three benchmark datasets (OGSOD, GH-cloud, MSRS), outperforming 7 SOTA methods in 8/11 metrics for visible-SAR fusion (OGSOD), 7/11 for cloud degraded visible-SAR fusion (GF-cloud), and top tier performance in low-light multispectral fusion (MSRS), demonstrating robustness in all-day all-weather conditions. Multimodal image fusion (MMIF) plays a crucial role in image information processing yet faces persistent challenges in handling anomalies to achieve robust multimodal image fusion. This paper proposes a novel vector quantization based dual branch autoencoder fusion algorithm to overcome these limitations. First, we establish two VQ codebooks for global and local features, which are learned and disentangled through two network branches. Subsequently, we employed attention-based network to learn the global features, enhancing it with a novel hybrid rotary position embedding (HRPE) module. Within the CNN branch, the Convolutional Block Attention Module (CBAM) is employed to capture detailed features. Finally, the fused image is formed through decoder based on the VQ features. Extensive experiments and quantitative metrics across three benchmark datasets (OGSOD, GF-cloud, MSRS) indicate that our method outperforms state-of-the-art methods, particularly excelling in anomaly suppression and structural fidelity preservation. Overall, the proposed framework offers a robust solution for all-weather multimodal image fusion tasks, with immediate applications in agricultural image analysis, surveillance imaging, and disaster response systems requiring reliable multimodal information integration.
ISSN:1566-2535
DOI:10.1016/j.inffus.2025.103630