Cross-modality masked autoencoder for infrared and visible image fusion

highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synt...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition Vol. 172; p. 112767
Main Authors:	Bi, Cong, Qian, Wenhua, Shao, Qiuhan, Cao, Jinde, Wang, Xue, Yan, Kaixiang
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01.04.2026
Subjects:	Cross-modality feature interaction Image fusion Infrared and visible image Masked autoencoder Transformer Cross-modality feature interaction Infrared and visible image Transformer Masked autoencoder Image fusion
ISSN:	0031-3203
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
AbstractList	highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
ArticleNumber	112767
Author	Cao, Jinde Wang, Xue Bi, Cong Qian, Wenhua Shao, Qiuhan Yan, Kaixiang
Author_xml	– sequence: 1 givenname: Cong orcidid: 0009-0007-8562-5964 surname: Bi fullname: Bi, Cong organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China – sequence: 2 givenname: Wenhua orcidid: 0000-0002-2895-2121 surname: Qian fullname: Qian, Wenhua email: whqian@ynu.edu.cn organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China – sequence: 3 givenname: Qiuhan orcidid: 0009-0005-1505-1569 surname: Shao fullname: Shao, Qiuhan organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China – sequence: 4 givenname: Jinde orcidid: 0000-0003-3133-7119 surname: Cao fullname: Cao, Jinde organization: The School of Mathematics, Southeast University, Nanjing, 210096, China – sequence: 5 givenname: Xue orcidid: 0000-0001-6674-8140 surname: Wang fullname: Wang, Xue organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China – sequence: 6 givenname: Kaixiang orcidid: 0000-0002-6441-3352 surname: Yan fullname: Yan, Kaixiang organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
BookMark	eNp9kEFuwjAQRb2gUoH2Bl34AknHdojJplKFWloJqZt2bZnxGJlCjOyAxO2bKF2zmsXX_3rzZmzSxpYYexJQChD187482Q7jrpQgF6UQUtd6wqYAShRKgrpns5z3AEKLSk7ZepVizsUxOnsI3ZUfbf4lx-25i9RidJS4j4mH1iebhqB1_BJy2B6Ih6PdEffnHGL7wO68PWR6_L9z9vP-9r36KDZf68_V66ZAoWVXLCq1rWhJUiIALGSF2isUiMvG2Vp5arTSta81ekcNYO2QQG_tUqIS6Bs1Z9W4iwN3Im9OqedIVyPADALM3owCzCDAjAL62stYo57tEiiZjKF_kFxIhJ1xMdwe-APoHmsP
Cites_doi	10.1109/TMM.2021.3129609 10.1109/TIM.2022.3218574 10.1109/TCSVT.2023.3289170 10.1109/TMM.2020.2997127 10.1109/TMM.2023.3326296 10.1016/j.inffus.2021.02.023 10.1016/j.inffus.2024.102352 10.1016/j.inffus.2024.102359 10.1109/TPAMI.2020.3012548 10.1016/j.patcog.2024.111041 10.1016/j.inffus.2023.102147 10.1016/j.inffus.2022.03.007 10.1016/j.patcog.2024.110984 10.1016/j.inffus.2022.10.034 10.1016/j.inffus.2021.12.004 10.1016/j.inffus.2024.102655 10.1109/JAS.2022.105686 10.1016/j.patcog.2025.111457 10.1016/j.patcog.2025.111391 10.1109/TIP.2025.3541562 10.1016/j.inffus.2018.09.004 10.1109/TIP.2020.2977573 10.1109/TMM.2022.3192661 10.1016/j.inffus.2022.11.010 10.1016/j.inffus.2022.12.007
ContentType	Journal Article
Copyright	2025 Elsevier Ltd
Copyright_xml	– notice: 2025 Elsevier Ltd
DBID	AAYXX CITATION
DOI	10.1016/j.patcog.2025.112767
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
ExternalDocumentID	10_1016_j_patcog_2025_112767 S003132032501430X
GroupedDBID	--K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9DU 9JN AABNK AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN AAYWO ABBOA ABDPE ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABWVN ABXDB ACBEA ACDAQ ACGFO ACGFS ACLOT ACNNM ACRLP ACRPL ACVFH ACZNC ADBBV ADCNI ADEZE ADJOM ADMUD ADMXK ADNMO ADTZH AEBSH AECPX AEFWE AEIPS AEKER AENEX AEUPX AFJKZ AFPUW AFTJW AGHFR AGQPQ AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFKBS EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- ~HD AAYXX CITATION
ID	FETCH-LOGICAL-c172t-543b4e8e22c000524c7f3c1cc89da63fe97376f67cfde90c6dce07ba82c31cf93
ISSN	0031-3203
IngestDate	Thu Nov 27 00:29:04 EST 2025 Wed Dec 10 14:33:04 EST 2025
IsPeerReviewed	true
IsScholarly	true
Keywords	Cross-modality feature interaction Infrared and visible image Transformer Masked autoencoder Image fusion
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c172t-543b4e8e22c000524c7f3c1cc89da63fe97376f67cfde90c6dce07ba82c31cf93
ORCID	0000-0002-2895-2121 0000-0002-6441-3352 0000-0003-3133-7119 0009-0007-8562-5964 0009-0005-1505-1569 0000-0001-6674-8140
ParticipantIDs	crossref_primary_10_1016_j_patcog_2025_112767 elsevier_sciencedirect_doi_10_1016_j_patcog_2025_112767
PublicationCentury	2000
PublicationDate	April 2026 2026-04-00
PublicationDateYYYYMMDD	2026-04-01
PublicationDate_xml	– month: 04 year: 2026 text: April 2026
PublicationDecade	2020
PublicationTitle	Pattern recognition
PublicationYear	2026
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	Cheng, Xu, Wu (bib0038) 2023; 92 Liu, Liu, Wu, Ma, Fan, Liu (bib0039) 2023 Zhang, Xu, Xiao, Guo, Ma (bib0011) 2020; 34 Li, Jiang, Liang, Ma, Nie (bib0006) 2025; 34 Tang, Yuan, Zhang, Jiang, Ma (bib0037) 2022; 83 Xu, Ma, Jiang, Guo, Ling (bib0035) 2020; 44 Su, Huang, Li, Zuo, Liu (bib0020) 2022; 71 Ma, Xu, Jiang, Mei, Zhang (bib0008) 2020; 29 Li, Chen, Liu, Ma (bib0010) 2023 Ma, Yu, Liang, Li, Jiang (bib0025) 2019; 48 Li, Zhu, Li, Chen, Yang (bib0013) 2022; 71 Liu, Lin, Cao, Hu, Wei, Zhang, Lin, Guo (bib0032) 2021 Wang, Fang, Zhao, Pan, Li, Li (bib0016) 2025; 158 Li, Huo, Li, Wang, Feng (bib0026) 2020; 23 Liang, Zeng, Zhang (bib0028) 2022 Tang, Xiang, Zhang, Gong, Ma (bib0023) 2023; 91 Tang, He, Liu (bib0015) 2022; 25 Li, Wu (bib0017) 2024; 103 Ali, Touvron, Caron, Bojanowski, Douze, Joulin, Laptev, Neverova, Synnaeve, Verbeek (bib0033) 2021; 34 Li, Wu, Kittler (bib0021) 2021; 73 Xu, Wang, Ma (bib0004) 2021; 70 Liu, Fan, Huang, Wu, Liu, Zhong, Luo (bib0036) 2022 Chen, Gu, Liu, Magid, Dong, Wang, Pfister, Zhu (bib0031) 2023 Jia, Zhu, Li, Tang, Zhou (bib0034) 2021 Zhang, Wang, Wu, Chen, Zheng, Cao, Zeng, Cai (bib0001) 2025; 158 Tang, Yuan, Ma (bib0012) 2022; 82 Guo, Luo, Liu, Zhang, Wu (bib0003) 2025; 162 Rao, Wu, Han, Wang, Yang, Lei, Zhou, Bai, Xing (bib0027) 2023; 92 Rao, Xu, Wu (bib0030) 2023 Park, Vien, Lee (bib0019) 2024; 34 Ming, Xiao, Liu, Zheng, Xiao (bib0002) 2025; 163 Zheng, Zhou, Huang, Zhao (bib0024) 2024; 109 Vs, Valanarasu, Oza, Patel (bib0029) 2022 Tang, Chen, Huang, Ma (bib0005) 2023; 26 Liu, Huo, Li, Pang, Zheng (bib0022) 2024; 108 Ma, Tang, Fan, Huang, Mei, Ma (bib0014) 2022; 9 Ma, Zhang, Shao, Liang, Xu (bib0007) 2020; 70 Zhou, Wu, Zhang, Ma, Ling (bib0009) 2021; 25 Zhang, Li, Xu, Wu, Kittler (bib0018) 2025; 114 Yi, Xu, Zhang, Tang, Ma (bib0040) 2024 Tang (10.1016/j.patcog.2025.112767_bib0023) 2023; 91 Zhang (10.1016/j.patcog.2025.112767_bib0011) 2020; 34 Ali (10.1016/j.patcog.2025.112767_bib0033) 2021; 34 Li (10.1016/j.patcog.2025.112767_bib0010) 2023 Liu (10.1016/j.patcog.2025.112767_bib0022) 2024; 108 Li (10.1016/j.patcog.2025.112767_bib0006) 2025; 34 Tang (10.1016/j.patcog.2025.112767_bib0005) 2023; 26 Li (10.1016/j.patcog.2025.112767_bib0013) 2022; 71 Li (10.1016/j.patcog.2025.112767_bib0021) 2021; 73 Cheng (10.1016/j.patcog.2025.112767_bib0038) 2023; 92 Tang (10.1016/j.patcog.2025.112767_bib0015) 2022; 25 Wang (10.1016/j.patcog.2025.112767_bib0016) 2025; 158 Yi (10.1016/j.patcog.2025.112767_bib0040) 2024 Liu (10.1016/j.patcog.2025.112767_bib0032) 2021 Chen (10.1016/j.patcog.2025.112767_bib0031) 2023 Zheng (10.1016/j.patcog.2025.112767_bib0024) 2024; 109 Ma (10.1016/j.patcog.2025.112767_bib0008) 2020; 29 Guo (10.1016/j.patcog.2025.112767_bib0003) 2025; 162 Xu (10.1016/j.patcog.2025.112767_bib0004) 2021; 70 Jia (10.1016/j.patcog.2025.112767_bib0034) 2021 Liu (10.1016/j.patcog.2025.112767_bib0036) 2022 Li (10.1016/j.patcog.2025.112767_bib0017) 2024; 103 Ming (10.1016/j.patcog.2025.112767_bib0002) 2025; 163 Ma (10.1016/j.patcog.2025.112767_bib0014) 2022; 9 Tang (10.1016/j.patcog.2025.112767_bib0012) 2022; 82 Su (10.1016/j.patcog.2025.112767_bib0020) 2022; 71 Xu (10.1016/j.patcog.2025.112767_bib0035) 2020; 44 Park (10.1016/j.patcog.2025.112767_bib0019) 2024; 34 Tang (10.1016/j.patcog.2025.112767_bib0037) 2022; 83 Ma (10.1016/j.patcog.2025.112767_bib0007) 2020; 70 Liu (10.1016/j.patcog.2025.112767_bib0039) 2023 Zhou (10.1016/j.patcog.2025.112767_bib0009) 2021; 25 Zhang (10.1016/j.patcog.2025.112767_bib0018) 2025; 114 Vs (10.1016/j.patcog.2025.112767_bib0029) 2022 Li (10.1016/j.patcog.2025.112767_bib0026) 2020; 23 Ma (10.1016/j.patcog.2025.112767_bib0025) 2019; 48 Rao (10.1016/j.patcog.2025.112767_bib0027) 2023; 92 Zhang (10.1016/j.patcog.2025.112767_bib0001) 2025; 158 Liang (10.1016/j.patcog.2025.112767_bib0028) 2022 Rao (10.1016/j.patcog.2025.112767_bib0030) 2023
References_xml	– volume: 34 start-page: 770 year: 2024 end-page: 785 ident: bib0019 article-title: Cross-modal transformers for infrared and visible image fusion publication-title: IEEE Trans. Circuits Syst. Video Technol. – start-page: 1240 year: 2023 end-page: 1248 ident: bib0039 article-title: Bi-level dynamic learning for jointly multi-modality image fusion and beyond publication-title: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence – volume: 71 start-page: 1 year: 2022 end-page: 14 ident: bib0020 article-title: Infrared and visible image fusion based on adversarial feature extraction and stable image reconstruction publication-title: IEEE Trans. Instrum. Meas. – start-page: 5802 year: 2022 end-page: 5811 ident: bib0036 article-title: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – start-page: 10012 year: 2021 end-page: 10022 ident: bib0032 article-title: Swin transformer: hierarchical vision transformer using shifted windows publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – volume: 34 start-page: 20014 year: 2021 end-page: 20027 ident: bib0033 article-title: XCIT: cross-covariance image transformers publication-title: Adv. Neural Inf. Process. Syst. – volume: 103 year: 2024 ident: bib0017 article-title: CrossFuse: a novel cross attention mechanism based infrared and visible image fusion approach publication-title: Inform. Fusion – volume: 26 start-page: 4776 year: 2023 end-page: 4791 ident: bib0005 article-title: CAMF: an interpretable infrared and visible image fusion network based on class activation mapping publication-title: IEEE Trans. Multimed. – volume: 70 start-page: 1 year: 2020 end-page: 14 ident: bib0007 article-title: GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion publication-title: IEEE Trans. Instrum. Meas. – volume: 25 start-page: 635 year: 2021 end-page: 648 ident: bib0009 article-title: Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network publication-title: IEEE Trans. Multimed. – volume: 108 year: 2024 ident: bib0022 article-title: A semantic-driven coupled network for infrared and visible image fusion publication-title: Inform. Fusion – volume: 23 start-page: 1383 year: 2020 end-page: 1396 ident: bib0026 article-title: AttentionFGAN: infrared and visible image fusion using attention-based generative adversarial networks publication-title: IEEE Trans. Multimed. – volume: 48 start-page: 11 year: 2019 end-page: 26 ident: bib0025 article-title: FusionGAN: a generative adversarial network for infrared and visible image fusion publication-title: Inform. fusion – volume: 92 start-page: 80 year: 2023 end-page: 92 ident: bib0038 article-title: MUFusion: a general unsupervised image fusion network based on memory unit publication-title: Inform. Fusion – volume: 29 start-page: 4980 year: 2020 end-page: 4995 ident: bib0008 article-title: DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion publication-title: IEEE Trans. Image Process. – volume: 34 start-page: 12797 year: 2020 end-page: 12804 ident: bib0011 article-title: Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – start-page: 3566 year: 2022 end-page: 3570 ident: bib0029 article-title: Image fusion transformer publication-title: 2022 IEEE International Conference on Image Processing (ICIP) – volume: 158 year: 2025 ident: bib0016 article-title: MMAE: a universal image fusion method via mask attention mechanism publication-title: Pattern Recognit. – volume: 70 start-page: 1 year: 2021 end-page: 13 ident: bib0004 article-title: DRF: disentangled representation for visible and infrared image fusion publication-title: IEEE Trans. Instrum. Meas. – volume: 9 start-page: 1200 year: 2022 end-page: 1217 ident: bib0014 article-title: SwinFusion: cross-domain long-range learning for general image fusion via swin transformer publication-title: IEEE/CAA J. Autom. Sin. – start-page: 27026 year: 2024 end-page: 27035 ident: bib0040 article-title: Text-if: leveraging semantic text guidance for degradation-aware and interactive image fusion publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 162 year: 2025 ident: bib0003 article-title: SAM-guided multi-level collaborative transformer for infrared and visible image fusion publication-title: Pattern Recognit. – volume: 92 start-page: 336 year: 2023 end-page: 349 ident: bib0027 article-title: AT-GAN: a generative adversarial network with attention and transition for infrared and visible image fusion publication-title: Inform. Fusion – start-page: 3496 year: 2021 end-page: 3504 ident: bib0034 article-title: LLVIP: a visible-infrared paired dataset for low-light vision publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision – volume: 91 start-page: 477 year: 2023 end-page: 493 ident: bib0023 article-title: DIVFusion: darkness-free infrared and visible image fusion publication-title: Inform. Fusion – volume: 71 start-page: 1 year: 2022 end-page: 14 ident: bib0013 article-title: CGTF: convolution-guided transformer for infrared and visible image fusion publication-title: IEEE Trans. Instrum. Meas. – volume: 114 year: 2025 ident: bib0018 article-title: DDBFusion: an unified image decomposition and fusion framework based on dual decomposition and Bézier curves publication-title: Inform. Fusion – volume: 34 start-page: 1340 year: 2025 end-page: 1353 ident: bib0006 article-title: MaeFuse: transferring omni features with pretrained masked autoencoders for infrared and visible image fusion via guided training publication-title: IEEE Trans. Image Process. – volume: 73 start-page: 72 year: 2021 end-page: 86 ident: bib0021 article-title: RFN-Nest: an end-to-end residual fusion network for infrared and visible images publication-title: Inform. Fusion – volume: 109 year: 2024 ident: bib0024 article-title: Frequency integration and spatial compensation network for infrared and visible image fusion publication-title: Inform. Fusion – volume: 25 start-page: 5413 year: 2022 end-page: 5428 ident: bib0015 article-title: YDTR: infrared and visible image fusion via Y-shape dynamic transformer publication-title: IEEE Trans. Multimed. – start-page: 5657 year: 2022 end-page: 5666 ident: bib0028 article-title: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 82 start-page: 28 year: 2022 end-page: 42 ident: bib0012 article-title: Image fusion in the loop of high-level vision tasks: a semantic-aware real-time infrared and visible image fusion network publication-title: Inform. Fusion – start-page: 4471 year: 2023 end-page: 4479 ident: bib0010 article-title: Learning a graph neural network with cross modality interaction for image fusion publication-title: Proceedings of the 31st ACM International Conference on Multimedia – year: 2023 ident: bib0030 article-title: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network publication-title: IEEE Trans. Image Process. – volume: 158 year: 2025 ident: bib0001 article-title: UniRTL: a universal RGBT and low-light benchmark for object tracking publication-title: Pattern Recognit. – volume: 83 start-page: 79 year: 2022 end-page: 92 ident: bib0037 article-title: PIAFusion: a progressive infrared and visible image fusion network based on illumination aware publication-title: Inform. Fusion – start-page: 1692 year: 2023 end-page: 1703 ident: bib0031 article-title: Masked image training for generalizable deep image denoising publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition – volume: 44 start-page: 502 year: 2020 end-page: 518 ident: bib0035 article-title: U2Fusion: a unified unsupervised image fusion network publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 163 year: 2025 ident: bib0002 article-title: SSDFusion: a scene-semantic decomposition approach for visible and infrared image fusion publication-title: Pattern Recognit. – volume: 25 start-page: 635 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0009 article-title: Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2021.3129609 – start-page: 3566 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0029 article-title: Image fusion transformer – volume: 71 start-page: 1 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0013 article-title: CGTF: convolution-guided transformer for infrared and visible image fusion publication-title: IEEE Trans. Instrum. Meas. doi: 10.1109/TIM.2022.3218574 – volume: 34 start-page: 770 issue: 2 year: 2024 ident: 10.1016/j.patcog.2025.112767_bib0019 article-title: Cross-modal transformers for infrared and visible image fusion publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2023.3289170 – volume: 23 start-page: 1383 year: 2020 ident: 10.1016/j.patcog.2025.112767_bib0026 article-title: AttentionFGAN: infrared and visible image fusion using attention-based generative adversarial networks publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2020.2997127 – volume: 26 start-page: 4776 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0005 article-title: CAMF: an interpretable infrared and visible image fusion network based on class activation mapping publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2023.3326296 – volume: 73 start-page: 72 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0021 article-title: RFN-Nest: an end-to-end residual fusion network for infrared and visible images publication-title: Inform. Fusion doi: 10.1016/j.inffus.2021.02.023 – volume: 108 year: 2024 ident: 10.1016/j.patcog.2025.112767_bib0022 article-title: A semantic-driven coupled network for infrared and visible image fusion publication-title: Inform. Fusion doi: 10.1016/j.inffus.2024.102352 – start-page: 4471 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0010 article-title: Learning a graph neural network with cross modality interaction for image fusion – volume: 109 year: 2024 ident: 10.1016/j.patcog.2025.112767_bib0024 article-title: Frequency integration and spatial compensation network for infrared and visible image fusion publication-title: Inform. Fusion doi: 10.1016/j.inffus.2024.102359 – start-page: 27026 year: 2024 ident: 10.1016/j.patcog.2025.112767_bib0040 article-title: Text-if: leveraging semantic text guidance for degradation-aware and interactive image fusion – volume: 44 start-page: 502 issue: 1 year: 2020 ident: 10.1016/j.patcog.2025.112767_bib0035 article-title: U2Fusion: a unified unsupervised image fusion network publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2020.3012548 – start-page: 5802 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0036 article-title: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection – volume: 70 start-page: 1 year: 2020 ident: 10.1016/j.patcog.2025.112767_bib0007 article-title: GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion publication-title: IEEE Trans. Instrum. Meas. – volume: 158 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0016 article-title: MMAE: a universal image fusion method via mask attention mechanism publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2024.111041 – volume: 71 start-page: 1 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0020 article-title: Infrared and visible image fusion based on adversarial feature extraction and stable image reconstruction publication-title: IEEE Trans. Instrum. Meas. – start-page: 5657 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0028 article-title: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution – volume: 103 year: 2024 ident: 10.1016/j.patcog.2025.112767_bib0017 article-title: CrossFuse: a novel cross attention mechanism based infrared and visible image fusion approach publication-title: Inform. Fusion doi: 10.1016/j.inffus.2023.102147 – start-page: 10012 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0032 article-title: Swin transformer: hierarchical vision transformer using shifted windows – volume: 83 start-page: 79 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0037 article-title: PIAFusion: a progressive infrared and visible image fusion network based on illumination aware publication-title: Inform. Fusion doi: 10.1016/j.inffus.2022.03.007 – volume: 158 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0001 article-title: UniRTL: a universal RGBT and low-light benchmark for object tracking publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2024.110984 – start-page: 1240 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0039 article-title: Bi-level dynamic learning for jointly multi-modality image fusion and beyond – start-page: 3496 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0034 article-title: LLVIP: a visible-infrared paired dataset for low-light vision – volume: 91 start-page: 477 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0023 article-title: DIVFusion: darkness-free infrared and visible image fusion publication-title: Inform. Fusion doi: 10.1016/j.inffus.2022.10.034 – volume: 82 start-page: 28 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0012 article-title: Image fusion in the loop of high-level vision tasks: a semantic-aware real-time infrared and visible image fusion network publication-title: Inform. Fusion doi: 10.1016/j.inffus.2021.12.004 – volume: 114 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0018 article-title: DDBFusion: an unified image decomposition and fusion framework based on dual decomposition and Bézier curves publication-title: Inform. Fusion doi: 10.1016/j.inffus.2024.102655 – volume: 9 start-page: 1200 issue: 7 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0014 article-title: SwinFusion: cross-domain long-range learning for general image fusion via swin transformer publication-title: IEEE/CAA J. Autom. Sin. doi: 10.1109/JAS.2022.105686 – volume: 163 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0002 article-title: SSDFusion: a scene-semantic decomposition approach for visible and infrared image fusion publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2025.111457 – year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0030 article-title: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network publication-title: IEEE Trans. Image Process. – volume: 162 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0003 article-title: SAM-guided multi-level collaborative transformer for infrared and visible image fusion publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2025.111391 – volume: 34 start-page: 1340 year: 2025 ident: 10.1016/j.patcog.2025.112767_bib0006 article-title: MaeFuse: transferring omni features with pretrained masked autoencoders for infrared and visible image fusion via guided training publication-title: IEEE Trans. Image Process. doi: 10.1109/TIP.2025.3541562 – volume: 48 start-page: 11 year: 2019 ident: 10.1016/j.patcog.2025.112767_bib0025 article-title: FusionGAN: a generative adversarial network for infrared and visible image fusion publication-title: Inform. fusion doi: 10.1016/j.inffus.2018.09.004 – start-page: 1692 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0031 article-title: Masked image training for generalizable deep image denoising – volume: 34 start-page: 12797 year: 2020 ident: 10.1016/j.patcog.2025.112767_bib0011 article-title: Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity – volume: 29 start-page: 4980 year: 2020 ident: 10.1016/j.patcog.2025.112767_bib0008 article-title: DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion publication-title: IEEE Trans. Image Process. doi: 10.1109/TIP.2020.2977573 – volume: 34 start-page: 20014 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0033 article-title: XCIT: cross-covariance image transformers publication-title: Adv. Neural Inf. Process. Syst. – volume: 25 start-page: 5413 year: 2022 ident: 10.1016/j.patcog.2025.112767_bib0015 article-title: YDTR: infrared and visible image fusion via Y-shape dynamic transformer publication-title: IEEE Trans. Multimed. doi: 10.1109/TMM.2022.3192661 – volume: 92 start-page: 80 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0038 article-title: MUFusion: a general unsupervised image fusion network based on memory unit publication-title: Inform. Fusion doi: 10.1016/j.inffus.2022.11.010 – volume: 70 start-page: 1 year: 2021 ident: 10.1016/j.patcog.2025.112767_bib0004 article-title: DRF: disentangled representation for visible and infrared image fusion publication-title: IEEE Trans. Instrum. Meas. – volume: 92 start-page: 336 year: 2023 ident: 10.1016/j.patcog.2025.112767_bib0027 article-title: AT-GAN: a generative adversarial network with attention and transition for infrared and visible image fusion publication-title: Inform. Fusion doi: 10.1016/j.inffus.2022.12.007
SSID	ssj0017142
Score	2.495584
Snippet	highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional...
SourceID	crossref elsevier
SourceType	Index Database Publisher
StartPage	112767
SubjectTerms	Cross-modality feature interaction Image fusion Infrared and visible image Masked autoencoder Transformer
Title	Cross-modality masked autoencoder for infrared and visible image fusion
URI	https://dx.doi.org/10.1016/j.patcog.2025.112767
Volume	172
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0031-3203 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0017142 providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3PT9swFLYY7LDLBmPT2AbyYVej2E5j-8gqtsEBMcFEb5Hr2GtBTaq2mfjz9_wjbRkTgkm7RJWlNNH7rPdenr_3PYQ-VXQImX1OCXVckxxCDFFKGEKZsIU2PNM6C8MmxNmZHAzUeRoJPw_jBERdy9tbNf2vUMMagO1bZ58A9_JPYQF-A-hwBdjh-ijg-z7ukUlTxQx7ouc3kFTqdtF4zUovHeECB93NAvfcF859g7nvoBpPPIPHtfMOrJS1ngcRTt_4kthGq7P7z4EN0G9SAPQl1HGsqV7ZetQunf7FSIei7PdxO1ptyH5cPPWajev1B1as0VZCUaxrjFmxkIKj5ZRwlvE7jjYO6bnntGP94PpwCsGn-Qnf7KznO5tEnNPxhxz2RVSbzDgL0oTZ4BnaYqKnwClvHZ0cD06XZ0iC5lErPr1K1zgZ2H33n_X3xGQt2bjcRi_TVwI-iujuoA1bv0avugkcODnkXfT1Ltg4go3XwMYANu7AxgA2TmDjADaOYL9BP74cX_a_kTQagxiw4oL0cj7MrbSMmXBYmxvhuKHGSFXpgjurBEQOVwjjKqsyU1TGZmKoJTOcGqf4W7RZN7V9hzC3vGK5k5ZacOiy0Fp6bVhIG6XOh9zuIdLZpZxGBZSyowZel9GOpbdjGe24h0RnvDJlcTE7KwHvB-98_893fkAvVlvzI9pczFq7j56bX4vxfHaQNsZvQGVuOw
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cross-modality+masked+autoencoder+for+infrared+and+visible+image+fusion&rft.jtitle=Pattern+recognition&rft.au=Bi%2C+Cong&rft.au=Qian%2C+Wenhua&rft.au=Shao%2C+Qiuhan&rft.au=Cao%2C+Jinde&rft.date=2026-04-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.volume=172&rft_id=info:doi/10.1016%2Fj.patcog.2025.112767&rft.externalDocID=S003132032501430X
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon