Cross-modality masked autoencoder for infrared and visible image fusion

highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synt...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition Vol. 172; p. 112767
Main Authors: Bi, Cong, Qian, Wenhua, Shao, Qiuhan, Cao, Jinde, Wang, Xue, Yan, Kaixiang
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.04.2026
Subjects:
ISSN:0031-3203
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
AbstractList highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional Transformer.•Superior to state-of-the-art methods in maintaining saliency and texture fidelity. Infrared and visible image fusion aims to synthesize a fused image that contains prominent targets and rich texture details. Effectively extracting and integrating cross-modality information remains a major challenge. In this paper, we propose an image fusion method based on the cross-modality masked autoencoder (CMMAE), called CMMAEFuse. First, we train the CMMAE, which uses information from one modality to supplement the other modality through a cross-modality feature interaction module, thereby effectively enhancing the encoder’s ability to extract complementary information. Subsequently, we design a dual-dimensional Transformer (DDT) to fuse the depth features extracted by the encoder to reconstruct the fused image. The DDT captures global interactions across spatial and channel dimensions, and exchanges information between the two dimensions through the spatial interaction module and the channel interaction module to realize feature aggregation between different dimensions for enhancing the complementary information and reducing the redundant information. Extensive experiments demonstrate that CMMAEFuse surpasses state-of-the-art methods. In addition, the application of object detection illustrates that CMMAEFuse improves the performance of downstream tasks.
ArticleNumber 112767
Author Cao, Jinde
Wang, Xue
Bi, Cong
Qian, Wenhua
Shao, Qiuhan
Yan, Kaixiang
Author_xml – sequence: 1
  givenname: Cong
  orcidid: 0009-0007-8562-5964
  surname: Bi
  fullname: Bi, Cong
  organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
– sequence: 2
  givenname: Wenhua
  orcidid: 0000-0002-2895-2121
  surname: Qian
  fullname: Qian, Wenhua
  email: whqian@ynu.edu.cn
  organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
– sequence: 3
  givenname: Qiuhan
  orcidid: 0009-0005-1505-1569
  surname: Shao
  fullname: Shao, Qiuhan
  organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
– sequence: 4
  givenname: Jinde
  orcidid: 0000-0003-3133-7119
  surname: Cao
  fullname: Cao, Jinde
  organization: The School of Mathematics, Southeast University, Nanjing, 210096, China
– sequence: 5
  givenname: Xue
  orcidid: 0000-0001-6674-8140
  surname: Wang
  fullname: Wang, Xue
  organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
– sequence: 6
  givenname: Kaixiang
  orcidid: 0000-0002-6441-3352
  surname: Yan
  fullname: Yan, Kaixiang
  organization: School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
BookMark eNp9kEFuwjAQRb2gUoH2Bl34AknHdojJplKFWloJqZt2bZnxGJlCjOyAxO2bKF2zmsXX_3rzZmzSxpYYexJQChD187482Q7jrpQgF6UQUtd6wqYAShRKgrpns5z3AEKLSk7ZepVizsUxOnsI3ZUfbf4lx-25i9RidJS4j4mH1iebhqB1_BJy2B6Ih6PdEffnHGL7wO68PWR6_L9z9vP-9r36KDZf68_V66ZAoWVXLCq1rWhJUiIALGSF2isUiMvG2Vp5arTSta81ekcNYO2QQG_tUqIS6Bs1Z9W4iwN3Im9OqedIVyPADALM3owCzCDAjAL62stYo57tEiiZjKF_kFxIhJ1xMdwe-APoHmsP
Cites_doi 10.1109/TMM.2021.3129609
10.1109/TIM.2022.3218574
10.1109/TCSVT.2023.3289170
10.1109/TMM.2020.2997127
10.1109/TMM.2023.3326296
10.1016/j.inffus.2021.02.023
10.1016/j.inffus.2024.102352
10.1016/j.inffus.2024.102359
10.1109/TPAMI.2020.3012548
10.1016/j.patcog.2024.111041
10.1016/j.inffus.2023.102147
10.1016/j.inffus.2022.03.007
10.1016/j.patcog.2024.110984
10.1016/j.inffus.2022.10.034
10.1016/j.inffus.2021.12.004
10.1016/j.inffus.2024.102655
10.1109/JAS.2022.105686
10.1016/j.patcog.2025.111457
10.1016/j.patcog.2025.111391
10.1109/TIP.2025.3541562
10.1016/j.inffus.2018.09.004
10.1109/TIP.2020.2977573
10.1109/TMM.2022.3192661
10.1016/j.inffus.2022.11.010
10.1016/j.inffus.2022.12.007
ContentType Journal Article
Copyright 2025 Elsevier Ltd
Copyright_xml – notice: 2025 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.patcog.2025.112767
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_patcog_2025_112767
S003132032501430X
GroupedDBID --K
--M
-D8
-DT
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9DU
9JN
AABNK
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABBOA
ABDPE
ABEFU
ABFNM
ABFRF
ABHFT
ABJNI
ABMAC
ABWVN
ABXDB
ACBEA
ACDAQ
ACGFO
ACGFS
ACLOT
ACNNM
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADJOM
ADMUD
ADMXK
ADNMO
ADTZH
AEBSH
AECPX
AEFWE
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AGHFR
AGQPQ
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFKBS
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FD6
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
KZ1
LG9
LMP
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WUQ
XJE
XPP
ZMT
ZY4
~G-
~HD
AAYXX
CITATION
ID FETCH-LOGICAL-c172t-543b4e8e22c000524c7f3c1cc89da63fe97376f67cfde90c6dce07ba82c31cf93
ISSN 0031-3203
IngestDate Thu Nov 27 00:29:04 EST 2025
Wed Dec 10 14:33:04 EST 2025
IsPeerReviewed true
IsScholarly true
Keywords Cross-modality feature interaction
Infrared and visible image
Transformer
Masked autoencoder
Image fusion
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c172t-543b4e8e22c000524c7f3c1cc89da63fe97376f67cfde90c6dce07ba82c31cf93
ORCID 0000-0002-2895-2121
0000-0002-6441-3352
0000-0003-3133-7119
0009-0007-8562-5964
0009-0005-1505-1569
0000-0001-6674-8140
ParticipantIDs crossref_primary_10_1016_j_patcog_2025_112767
elsevier_sciencedirect_doi_10_1016_j_patcog_2025_112767
PublicationCentury 2000
PublicationDate April 2026
2026-04-00
PublicationDateYYYYMMDD 2026-04-01
PublicationDate_xml – month: 04
  year: 2026
  text: April 2026
PublicationDecade 2020
PublicationTitle Pattern recognition
PublicationYear 2026
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Cheng, Xu, Wu (bib0038) 2023; 92
Liu, Liu, Wu, Ma, Fan, Liu (bib0039) 2023
Zhang, Xu, Xiao, Guo, Ma (bib0011) 2020; 34
Li, Jiang, Liang, Ma, Nie (bib0006) 2025; 34
Tang, Yuan, Zhang, Jiang, Ma (bib0037) 2022; 83
Xu, Ma, Jiang, Guo, Ling (bib0035) 2020; 44
Su, Huang, Li, Zuo, Liu (bib0020) 2022; 71
Ma, Xu, Jiang, Mei, Zhang (bib0008) 2020; 29
Li, Chen, Liu, Ma (bib0010) 2023
Ma, Yu, Liang, Li, Jiang (bib0025) 2019; 48
Li, Zhu, Li, Chen, Yang (bib0013) 2022; 71
Liu, Lin, Cao, Hu, Wei, Zhang, Lin, Guo (bib0032) 2021
Wang, Fang, Zhao, Pan, Li, Li (bib0016) 2025; 158
Li, Huo, Li, Wang, Feng (bib0026) 2020; 23
Liang, Zeng, Zhang (bib0028) 2022
Tang, Xiang, Zhang, Gong, Ma (bib0023) 2023; 91
Tang, He, Liu (bib0015) 2022; 25
Li, Wu (bib0017) 2024; 103
Ali, Touvron, Caron, Bojanowski, Douze, Joulin, Laptev, Neverova, Synnaeve, Verbeek (bib0033) 2021; 34
Li, Wu, Kittler (bib0021) 2021; 73
Xu, Wang, Ma (bib0004) 2021; 70
Liu, Fan, Huang, Wu, Liu, Zhong, Luo (bib0036) 2022
Chen, Gu, Liu, Magid, Dong, Wang, Pfister, Zhu (bib0031) 2023
Jia, Zhu, Li, Tang, Zhou (bib0034) 2021
Zhang, Wang, Wu, Chen, Zheng, Cao, Zeng, Cai (bib0001) 2025; 158
Tang, Yuan, Ma (bib0012) 2022; 82
Guo, Luo, Liu, Zhang, Wu (bib0003) 2025; 162
Rao, Wu, Han, Wang, Yang, Lei, Zhou, Bai, Xing (bib0027) 2023; 92
Rao, Xu, Wu (bib0030) 2023
Park, Vien, Lee (bib0019) 2024; 34
Ming, Xiao, Liu, Zheng, Xiao (bib0002) 2025; 163
Zheng, Zhou, Huang, Zhao (bib0024) 2024; 109
Vs, Valanarasu, Oza, Patel (bib0029) 2022
Tang, Chen, Huang, Ma (bib0005) 2023; 26
Liu, Huo, Li, Pang, Zheng (bib0022) 2024; 108
Ma, Tang, Fan, Huang, Mei, Ma (bib0014) 2022; 9
Ma, Zhang, Shao, Liang, Xu (bib0007) 2020; 70
Zhou, Wu, Zhang, Ma, Ling (bib0009) 2021; 25
Zhang, Li, Xu, Wu, Kittler (bib0018) 2025; 114
Yi, Xu, Zhang, Tang, Ma (bib0040) 2024
Tang (10.1016/j.patcog.2025.112767_bib0023) 2023; 91
Zhang (10.1016/j.patcog.2025.112767_bib0011) 2020; 34
Ali (10.1016/j.patcog.2025.112767_bib0033) 2021; 34
Li (10.1016/j.patcog.2025.112767_bib0010) 2023
Liu (10.1016/j.patcog.2025.112767_bib0022) 2024; 108
Li (10.1016/j.patcog.2025.112767_bib0006) 2025; 34
Tang (10.1016/j.patcog.2025.112767_bib0005) 2023; 26
Li (10.1016/j.patcog.2025.112767_bib0013) 2022; 71
Li (10.1016/j.patcog.2025.112767_bib0021) 2021; 73
Cheng (10.1016/j.patcog.2025.112767_bib0038) 2023; 92
Tang (10.1016/j.patcog.2025.112767_bib0015) 2022; 25
Wang (10.1016/j.patcog.2025.112767_bib0016) 2025; 158
Yi (10.1016/j.patcog.2025.112767_bib0040) 2024
Liu (10.1016/j.patcog.2025.112767_bib0032) 2021
Chen (10.1016/j.patcog.2025.112767_bib0031) 2023
Zheng (10.1016/j.patcog.2025.112767_bib0024) 2024; 109
Ma (10.1016/j.patcog.2025.112767_bib0008) 2020; 29
Guo (10.1016/j.patcog.2025.112767_bib0003) 2025; 162
Xu (10.1016/j.patcog.2025.112767_bib0004) 2021; 70
Jia (10.1016/j.patcog.2025.112767_bib0034) 2021
Liu (10.1016/j.patcog.2025.112767_bib0036) 2022
Li (10.1016/j.patcog.2025.112767_bib0017) 2024; 103
Ming (10.1016/j.patcog.2025.112767_bib0002) 2025; 163
Ma (10.1016/j.patcog.2025.112767_bib0014) 2022; 9
Tang (10.1016/j.patcog.2025.112767_bib0012) 2022; 82
Su (10.1016/j.patcog.2025.112767_bib0020) 2022; 71
Xu (10.1016/j.patcog.2025.112767_bib0035) 2020; 44
Park (10.1016/j.patcog.2025.112767_bib0019) 2024; 34
Tang (10.1016/j.patcog.2025.112767_bib0037) 2022; 83
Ma (10.1016/j.patcog.2025.112767_bib0007) 2020; 70
Liu (10.1016/j.patcog.2025.112767_bib0039) 2023
Zhou (10.1016/j.patcog.2025.112767_bib0009) 2021; 25
Zhang (10.1016/j.patcog.2025.112767_bib0018) 2025; 114
Vs (10.1016/j.patcog.2025.112767_bib0029) 2022
Li (10.1016/j.patcog.2025.112767_bib0026) 2020; 23
Ma (10.1016/j.patcog.2025.112767_bib0025) 2019; 48
Rao (10.1016/j.patcog.2025.112767_bib0027) 2023; 92
Zhang (10.1016/j.patcog.2025.112767_bib0001) 2025; 158
Liang (10.1016/j.patcog.2025.112767_bib0028) 2022
Rao (10.1016/j.patcog.2025.112767_bib0030) 2023
References_xml – volume: 34
  start-page: 770
  year: 2024
  end-page: 785
  ident: bib0019
  article-title: Cross-modal transformers for infrared and visible image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– start-page: 1240
  year: 2023
  end-page: 1248
  ident: bib0039
  article-title: Bi-level dynamic learning for jointly multi-modality image fusion and beyond
  publication-title: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
– volume: 71
  start-page: 1
  year: 2022
  end-page: 14
  ident: bib0020
  article-title: Infrared and visible image fusion based on adversarial feature extraction and stable image reconstruction
  publication-title: IEEE Trans. Instrum. Meas.
– start-page: 5802
  year: 2022
  end-page: 5811
  ident: bib0036
  article-title: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– start-page: 10012
  year: 2021
  end-page: 10022
  ident: bib0032
  article-title: Swin transformer: hierarchical vision transformer using shifted windows
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– volume: 34
  start-page: 20014
  year: 2021
  end-page: 20027
  ident: bib0033
  article-title: XCIT: cross-covariance image transformers
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 103
  year: 2024
  ident: bib0017
  article-title: CrossFuse: a novel cross attention mechanism based infrared and visible image fusion approach
  publication-title: Inform. Fusion
– volume: 26
  start-page: 4776
  year: 2023
  end-page: 4791
  ident: bib0005
  article-title: CAMF: an interpretable infrared and visible image fusion network based on class activation mapping
  publication-title: IEEE Trans. Multimed.
– volume: 70
  start-page: 1
  year: 2020
  end-page: 14
  ident: bib0007
  article-title: GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 25
  start-page: 635
  year: 2021
  end-page: 648
  ident: bib0009
  article-title: Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network
  publication-title: IEEE Trans. Multimed.
– volume: 108
  year: 2024
  ident: bib0022
  article-title: A semantic-driven coupled network for infrared and visible image fusion
  publication-title: Inform. Fusion
– volume: 23
  start-page: 1383
  year: 2020
  end-page: 1396
  ident: bib0026
  article-title: AttentionFGAN: infrared and visible image fusion using attention-based generative adversarial networks
  publication-title: IEEE Trans. Multimed.
– volume: 48
  start-page: 11
  year: 2019
  end-page: 26
  ident: bib0025
  article-title: FusionGAN: a generative adversarial network for infrared and visible image fusion
  publication-title: Inform. fusion
– volume: 92
  start-page: 80
  year: 2023
  end-page: 92
  ident: bib0038
  article-title: MUFusion: a general unsupervised image fusion network based on memory unit
  publication-title: Inform. Fusion
– volume: 29
  start-page: 4980
  year: 2020
  end-page: 4995
  ident: bib0008
  article-title: DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion
  publication-title: IEEE Trans. Image Process.
– volume: 34
  start-page: 12797
  year: 2020
  end-page: 12804
  ident: bib0011
  article-title: Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– start-page: 3566
  year: 2022
  end-page: 3570
  ident: bib0029
  article-title: Image fusion transformer
  publication-title: 2022 IEEE International Conference on Image Processing (ICIP)
– volume: 158
  year: 2025
  ident: bib0016
  article-title: MMAE: a universal image fusion method via mask attention mechanism
  publication-title: Pattern Recognit.
– volume: 70
  start-page: 1
  year: 2021
  end-page: 13
  ident: bib0004
  article-title: DRF: disentangled representation for visible and infrared image fusion
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 9
  start-page: 1200
  year: 2022
  end-page: 1217
  ident: bib0014
  article-title: SwinFusion: cross-domain long-range learning for general image fusion via swin transformer
  publication-title: IEEE/CAA J. Autom. Sin.
– start-page: 27026
  year: 2024
  end-page: 27035
  ident: bib0040
  article-title: Text-if: leveraging semantic text guidance for degradation-aware and interactive image fusion
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 162
  year: 2025
  ident: bib0003
  article-title: SAM-guided multi-level collaborative transformer for infrared and visible image fusion
  publication-title: Pattern Recognit.
– volume: 92
  start-page: 336
  year: 2023
  end-page: 349
  ident: bib0027
  article-title: AT-GAN: a generative adversarial network with attention and transition for infrared and visible image fusion
  publication-title: Inform. Fusion
– start-page: 3496
  year: 2021
  end-page: 3504
  ident: bib0034
  article-title: LLVIP: a visible-infrared paired dataset for low-light vision
  publication-title: Proceedings of the IEEE/CVF International Conference on Computer Vision
– volume: 91
  start-page: 477
  year: 2023
  end-page: 493
  ident: bib0023
  article-title: DIVFusion: darkness-free infrared and visible image fusion
  publication-title: Inform. Fusion
– volume: 71
  start-page: 1
  year: 2022
  end-page: 14
  ident: bib0013
  article-title: CGTF: convolution-guided transformer for infrared and visible image fusion
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 114
  year: 2025
  ident: bib0018
  article-title: DDBFusion: an unified image decomposition and fusion framework based on dual decomposition and Bézier curves
  publication-title: Inform. Fusion
– volume: 34
  start-page: 1340
  year: 2025
  end-page: 1353
  ident: bib0006
  article-title: MaeFuse: transferring omni features with pretrained masked autoencoders for infrared and visible image fusion via guided training
  publication-title: IEEE Trans. Image Process.
– volume: 73
  start-page: 72
  year: 2021
  end-page: 86
  ident: bib0021
  article-title: RFN-Nest: an end-to-end residual fusion network for infrared and visible images
  publication-title: Inform. Fusion
– volume: 109
  year: 2024
  ident: bib0024
  article-title: Frequency integration and spatial compensation network for infrared and visible image fusion
  publication-title: Inform. Fusion
– volume: 25
  start-page: 5413
  year: 2022
  end-page: 5428
  ident: bib0015
  article-title: YDTR: infrared and visible image fusion via Y-shape dynamic transformer
  publication-title: IEEE Trans. Multimed.
– start-page: 5657
  year: 2022
  end-page: 5666
  ident: bib0028
  article-title: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 82
  start-page: 28
  year: 2022
  end-page: 42
  ident: bib0012
  article-title: Image fusion in the loop of high-level vision tasks: a semantic-aware real-time infrared and visible image fusion network
  publication-title: Inform. Fusion
– start-page: 4471
  year: 2023
  end-page: 4479
  ident: bib0010
  article-title: Learning a graph neural network with cross modality interaction for image fusion
  publication-title: Proceedings of the 31st ACM International Conference on Multimedia
– year: 2023
  ident: bib0030
  article-title: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network
  publication-title: IEEE Trans. Image Process.
– volume: 158
  year: 2025
  ident: bib0001
  article-title: UniRTL: a universal RGBT and low-light benchmark for object tracking
  publication-title: Pattern Recognit.
– volume: 83
  start-page: 79
  year: 2022
  end-page: 92
  ident: bib0037
  article-title: PIAFusion: a progressive infrared and visible image fusion network based on illumination aware
  publication-title: Inform. Fusion
– start-page: 1692
  year: 2023
  end-page: 1703
  ident: bib0031
  article-title: Masked image training for generalizable deep image denoising
  publication-title: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
– volume: 44
  start-page: 502
  year: 2020
  end-page: 518
  ident: bib0035
  article-title: U2Fusion: a unified unsupervised image fusion network
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 163
  year: 2025
  ident: bib0002
  article-title: SSDFusion: a scene-semantic decomposition approach for visible and infrared image fusion
  publication-title: Pattern Recognit.
– volume: 25
  start-page: 635
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0009
  article-title: Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2021.3129609
– start-page: 3566
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0029
  article-title: Image fusion transformer
– volume: 71
  start-page: 1
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0013
  article-title: CGTF: convolution-guided transformer for infrared and visible image fusion
  publication-title: IEEE Trans. Instrum. Meas.
  doi: 10.1109/TIM.2022.3218574
– volume: 34
  start-page: 770
  issue: 2
  year: 2024
  ident: 10.1016/j.patcog.2025.112767_bib0019
  article-title: Cross-modal transformers for infrared and visible image fusion
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2023.3289170
– volume: 23
  start-page: 1383
  year: 2020
  ident: 10.1016/j.patcog.2025.112767_bib0026
  article-title: AttentionFGAN: infrared and visible image fusion using attention-based generative adversarial networks
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2020.2997127
– volume: 26
  start-page: 4776
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0005
  article-title: CAMF: an interpretable infrared and visible image fusion network based on class activation mapping
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2023.3326296
– volume: 73
  start-page: 72
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0021
  article-title: RFN-Nest: an end-to-end residual fusion network for infrared and visible images
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2021.02.023
– volume: 108
  year: 2024
  ident: 10.1016/j.patcog.2025.112767_bib0022
  article-title: A semantic-driven coupled network for infrared and visible image fusion
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2024.102352
– start-page: 4471
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0010
  article-title: Learning a graph neural network with cross modality interaction for image fusion
– volume: 109
  year: 2024
  ident: 10.1016/j.patcog.2025.112767_bib0024
  article-title: Frequency integration and spatial compensation network for infrared and visible image fusion
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2024.102359
– start-page: 27026
  year: 2024
  ident: 10.1016/j.patcog.2025.112767_bib0040
  article-title: Text-if: leveraging semantic text guidance for degradation-aware and interactive image fusion
– volume: 44
  start-page: 502
  issue: 1
  year: 2020
  ident: 10.1016/j.patcog.2025.112767_bib0035
  article-title: U2Fusion: a unified unsupervised image fusion network
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2020.3012548
– start-page: 5802
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0036
  article-title: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection
– volume: 70
  start-page: 1
  year: 2020
  ident: 10.1016/j.patcog.2025.112767_bib0007
  article-title: GANMcC: a generative adversarial network with multiclassification constraints for infrared and visible image fusion
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 158
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0016
  article-title: MMAE: a universal image fusion method via mask attention mechanism
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2024.111041
– volume: 71
  start-page: 1
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0020
  article-title: Infrared and visible image fusion based on adversarial feature extraction and stable image reconstruction
  publication-title: IEEE Trans. Instrum. Meas.
– start-page: 5657
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0028
  article-title: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution
– volume: 103
  year: 2024
  ident: 10.1016/j.patcog.2025.112767_bib0017
  article-title: CrossFuse: a novel cross attention mechanism based infrared and visible image fusion approach
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2023.102147
– start-page: 10012
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0032
  article-title: Swin transformer: hierarchical vision transformer using shifted windows
– volume: 83
  start-page: 79
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0037
  article-title: PIAFusion: a progressive infrared and visible image fusion network based on illumination aware
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2022.03.007
– volume: 158
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0001
  article-title: UniRTL: a universal RGBT and low-light benchmark for object tracking
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2024.110984
– start-page: 1240
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0039
  article-title: Bi-level dynamic learning for jointly multi-modality image fusion and beyond
– start-page: 3496
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0034
  article-title: LLVIP: a visible-infrared paired dataset for low-light vision
– volume: 91
  start-page: 477
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0023
  article-title: DIVFusion: darkness-free infrared and visible image fusion
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2022.10.034
– volume: 82
  start-page: 28
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0012
  article-title: Image fusion in the loop of high-level vision tasks: a semantic-aware real-time infrared and visible image fusion network
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2021.12.004
– volume: 114
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0018
  article-title: DDBFusion: an unified image decomposition and fusion framework based on dual decomposition and Bézier curves
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2024.102655
– volume: 9
  start-page: 1200
  issue: 7
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0014
  article-title: SwinFusion: cross-domain long-range learning for general image fusion via swin transformer
  publication-title: IEEE/CAA J. Autom. Sin.
  doi: 10.1109/JAS.2022.105686
– volume: 163
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0002
  article-title: SSDFusion: a scene-semantic decomposition approach for visible and infrared image fusion
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2025.111457
– year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0030
  article-title: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network
  publication-title: IEEE Trans. Image Process.
– volume: 162
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0003
  article-title: SAM-guided multi-level collaborative transformer for infrared and visible image fusion
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2025.111391
– volume: 34
  start-page: 1340
  year: 2025
  ident: 10.1016/j.patcog.2025.112767_bib0006
  article-title: MaeFuse: transferring omni features with pretrained masked autoencoders for infrared and visible image fusion via guided training
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2025.3541562
– volume: 48
  start-page: 11
  year: 2019
  ident: 10.1016/j.patcog.2025.112767_bib0025
  article-title: FusionGAN: a generative adversarial network for infrared and visible image fusion
  publication-title: Inform. fusion
  doi: 10.1016/j.inffus.2018.09.004
– start-page: 1692
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0031
  article-title: Masked image training for generalizable deep image denoising
– volume: 34
  start-page: 12797
  year: 2020
  ident: 10.1016/j.patcog.2025.112767_bib0011
  article-title: Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity
– volume: 29
  start-page: 4980
  year: 2020
  ident: 10.1016/j.patcog.2025.112767_bib0008
  article-title: DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2020.2977573
– volume: 34
  start-page: 20014
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0033
  article-title: XCIT: cross-covariance image transformers
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 25
  start-page: 5413
  year: 2022
  ident: 10.1016/j.patcog.2025.112767_bib0015
  article-title: YDTR: infrared and visible image fusion via Y-shape dynamic transformer
  publication-title: IEEE Trans. Multimed.
  doi: 10.1109/TMM.2022.3192661
– volume: 92
  start-page: 80
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0038
  article-title: MUFusion: a general unsupervised image fusion network based on memory unit
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2022.11.010
– volume: 70
  start-page: 1
  year: 2021
  ident: 10.1016/j.patcog.2025.112767_bib0004
  article-title: DRF: disentangled representation for visible and infrared image fusion
  publication-title: IEEE Trans. Instrum. Meas.
– volume: 92
  start-page: 336
  year: 2023
  ident: 10.1016/j.patcog.2025.112767_bib0027
  article-title: AT-GAN: a generative adversarial network with attention and transition for infrared and visible image fusion
  publication-title: Inform. Fusion
  doi: 10.1016/j.inffus.2022.12.007
SSID ssj0017142
Score 2.495584
Snippet highlights•A cross-modality masked autoencoder is proposed to extract complementary information.•Enhancing complementary information through dual-dimensional...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 112767
SubjectTerms Cross-modality feature interaction
Image fusion
Infrared and visible image
Masked autoencoder
Transformer
Title Cross-modality masked autoencoder for infrared and visible image fusion
URI https://dx.doi.org/10.1016/j.patcog.2025.112767
Volume 172
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0031-3203
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0017142
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELZgy4ELlJcoUOQDV1eJnY3tY1kV2h6qIorYW-T4wW7RJqvdDerPZ_zIPiiq6IFLFFmJE3k-zYzH38wg9IGboXVZzUlNXQEblIwRqWtJlDeOUgstCheaTfCLCzEey8tEa16GdgK8acTNjZz_V1HDGAjbp87eQ9zrSWEA7kHocAWxw_WfBD_ydo_MWhM97Jla_gSnUnWr1tes9KUjXOCgu0XgnvvAuU8w9xlU05ln8Lhu2Qsrea2XoQinT3xJbKPN2f3HwAYYtckA-hDqNMZUv9tm0q2V_teJCkHZL9NusgHkKA6e-5qN2_EHWm7RVpJOZTlhNGM7OjX240laEXw6Hptu3FLYMXZwfTQHw9P-gP06HR5tHt-tj_2H3VqzCXui2nUVZ6n8LFWc5SHao3woxQDtHZ-djM_XJ0w8L2Il-fT3fVpl4P7d_pu_uy1brsjVPnqS9hD4OMr-GXpgm-foad-fAyd1_QJ93oUCjlDAW1DAAAXcQwEDFHCCAg5QwBEKL9G3TydXo1OSGmcQDQu_IsOC1YUVllIdjnILzR3TudZCGlUyZyUHu-JKrp2xMtOl0TbjtRJUs1w7yV6hQdM29jXCYA5sbmpbGmcKwaySSqjMMpobC7vr-gCRfl2qeayPUt0ljwPE-8Wrko8XfbcKEHHnm2_u-aW36PEGru_QYLXo7CF6pH-tpsvF-wSH3zNYdXU
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cross-modality+masked+autoencoder+for+infrared+and+visible+image+fusion&rft.jtitle=Pattern+recognition&rft.au=Bi%2C+Cong&rft.au=Qian%2C+Wenhua&rft.au=Shao%2C+Qiuhan&rft.au=Cao%2C+Jinde&rft.date=2026-04-01&rft.issn=0031-3203&rft.volume=172&rft.spage=112767&rft_id=info:doi/10.1016%2Fj.patcog.2025.112767&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patcog_2025_112767
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon