ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection

Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition Vol. 145; p. 109913
Main Authors: Shen, Jifeng, Chen, Yifei, Liu, Yue, Zuo, Xin, Fan, Heng, Yang, Wankou
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.01.2024
Subjects:
ISSN:0031-3203, 1873-5142
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion. •A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed.
AbstractList Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness of feature fusion using convolutional neural networks, but these methods are sensitive to image misalignment due to the inherent deficiency in local-range feature interaction resulting in the performance degradation. To address this issue, a novel feature fusion framework of dual cross-attention transformers is proposed to model global feature interaction and capture complementary information across modalities simultaneously. This framework enhances the discriminability of object features through the query-guided cross-attention mechanism, leading to improved performance. However, stacking multiple transformer blocks for feature enhancement incurs a large number of parameters and high spatial complexity. To handle this, inspired by the human process of reviewing knowledge, an iterative interaction mechanism is proposed to share parameters among block-wise multimodal transformers, reducing model complexity and computation cost. The proposed method is general and effective to be integrated into different detection frameworks and used with different backbones. Experimental results on KAIST, FLIR, and VEDAI datasets show that the proposed method achieves superior performance and faster inference, making it suitable for various practical scenarios. Code will be available at https://github.com/chanchanchan97/ICAFusion. •A novel dual cross-attention feature fusion method is proposed for multispectral object detection, which simultaneously aggregates complementary information from RGB and thermal images.•An iterative learning strategy is tailored for efficient multispectral feature fusion, which further improves the model performance without additional increase of learnable parameters.•The proposed feature fusion method is both generalizable and effective, which can be plugged into different backbones and equipped with different detection frameworks.•The proposed CFE/ICFE module can function with different input image modalities, which provide a feasible solution when one of the modality is missing or has pool quality.•The proposed method can achieve the state-of-the-arts results on KAIST, FLIR and VEDAI datasets, while also obtains very fast inference speed.
ArticleNumber 109913
Author Shen, Jifeng
Zuo, Xin
Fan, Heng
Chen, Yifei
Yang, Wankou
Liu, Yue
Author_xml – sequence: 1
  givenname: Jifeng
  surname: Shen
  fullname: Shen, Jifeng
  email: shenjifeng@ujs.edu.cn
  organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
– sequence: 2
  givenname: Yifei
  surname: Chen
  fullname: Chen, Yifei
  organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
– sequence: 3
  givenname: Yue
  surname: Liu
  fullname: Liu, Yue
  organization: School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
– sequence: 4
  givenname: Xin
  surname: Zuo
  fullname: Zuo, Xin
  organization: School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
– sequence: 5
  givenname: Heng
  surname: Fan
  fullname: Fan, Heng
  organization: Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207, USA
– sequence: 6
  givenname: Wankou
  surname: Yang
  fullname: Yang, Wankou
  organization: School of Automation, Southeast University, Nanjing, 210096, China
BookMark eNqFkMFOwzAMhiM0JLbBG3DIC3QkTZPSHZDQxMakSVzgiKI0cadUXTMl6STenmzlxAFOtmx_9u9_hia96wGhe0oWlFDx0C6OKmq3X-QkZ6lUVZRdoSl9LFnGaZFP0JQQRjOWE3aDZiG0hNAyNaboc7t6Xg_Bun6JtxG8ivYEWHsXQqZihD6mFt4P1oDBDag4eMDNBcCN8_gwdNGGI-joVYdd3aYMG4gppJFbdN2oLsDdT5yjj_XL--o1271t0uFdphkRMauoKMDw2nBRqYoxyJmijFd1zepKEcMV5aIBURKiC2Hq3HAKqiRCaZ0IzuZoOe69CPfQSG2jOitIsmwnKZFno2QrR6Pk2Sg5GpXg4hd89Pag_Nd_2NOIQXrsZMHLoC30Goz16XtpnP17wTdHNIiN
CitedBy_id crossref_primary_10_1016_j_eswa_2025_128744
crossref_primary_10_1016_j_eswa_2025_129679
crossref_primary_10_3390_s25164964
crossref_primary_10_3390_agronomy15092199
crossref_primary_10_1007_s11760_024_03337_4
crossref_primary_10_3389_fpls_2025_1538051
crossref_primary_10_1109_TIM_2025_3569003
crossref_primary_10_1007_s10044_025_01515_9
crossref_primary_10_1016_j_cosrev_2025_100804
crossref_primary_10_1109_TCPMT_2024_3491163
crossref_primary_10_1016_j_neucom_2025_129913
crossref_primary_10_1109_TMM_2025_3543056
crossref_primary_10_1109_TAI_2024_3436037
crossref_primary_10_1088_1361_6501_adfb9e
crossref_primary_10_1109_TGRS_2025_3578876
crossref_primary_10_1016_j_eswa_2025_129677
crossref_primary_10_1016_j_neucom_2025_129595
crossref_primary_10_1109_TCSVT_2024_3454631
crossref_primary_10_1109_TGRS_2025_3561133
crossref_primary_10_1038_s41598_025_85697_6
crossref_primary_10_1007_s12204_025_2835_3
crossref_primary_10_1016_j_asoc_2025_113683
crossref_primary_10_3390_rs16234451
crossref_primary_10_1109_TGRS_2025_3577046
crossref_primary_10_1049_itr2_12562
crossref_primary_10_1016_j_ipm_2025_104290
crossref_primary_10_1016_j_renene_2025_122926
crossref_primary_10_3390_electronics14132684
crossref_primary_10_1109_TMM_2024_3410113
crossref_primary_10_1016_j_oceaneng_2025_121185
crossref_primary_10_3390_rs17101723
crossref_primary_10_1016_j_eswa_2025_128996
crossref_primary_10_1109_TITS_2024_3412417
crossref_primary_10_3390_s25113392
crossref_primary_10_1109_JSTARS_2025_3553747
crossref_primary_10_1109_TGRS_2025_3530085
crossref_primary_10_1016_j_cja_2025_103781
crossref_primary_10_1016_j_neucom_2024_128957
crossref_primary_10_1117_1_JEI_34_3_033046
crossref_primary_10_3390_f16071088
crossref_primary_10_1016_j_aiia_2025_05_002
crossref_primary_10_1109_TGRS_2025_3552787
crossref_primary_10_1007_s00371_025_04071_9
crossref_primary_10_1016_j_asoc_2025_113645
crossref_primary_10_1016_j_patcog_2024_111040
crossref_primary_10_3390_s24041168
crossref_primary_10_1109_JSEN_2024_3386709
crossref_primary_10_1109_TIV_2024_3443264
crossref_primary_10_1016_j_compag_2025_109957
crossref_primary_10_1016_j_measurement_2025_117043
crossref_primary_10_1016_j_dsp_2025_104996
crossref_primary_10_1109_LGRS_2024_3440045
crossref_primary_10_1088_1361_6501_adcf44
crossref_primary_10_1016_j_asoc_2024_111971
crossref_primary_10_3390_rs17152650
crossref_primary_10_1016_j_imavis_2025_105468
crossref_primary_10_1016_j_brainres_2025_149507
crossref_primary_10_1109_JSTARS_2025_3571391
crossref_primary_10_1109_TCSVT_2025_3539625
crossref_primary_10_1016_j_patcog_2025_111425
crossref_primary_10_1109_TGRS_2025_3586620
crossref_primary_10_1109_TGRS_2024_3446814
crossref_primary_10_1016_j_patcog_2024_110854
crossref_primary_10_1038_s41598_025_88871_y
crossref_primary_10_3390_s24206717
crossref_primary_10_3390_s25072306
crossref_primary_10_1016_j_sigpro_2025_110231
crossref_primary_10_1109_JSTARS_2025_3603506
crossref_primary_10_1109_TCSVT_2024_3418965
crossref_primary_10_1016_j_engappai_2025_112165
crossref_primary_10_1016_j_patcog_2025_111383
crossref_primary_10_1109_JAS_2025_125333
crossref_primary_10_1016_j_neucom_2025_130811
crossref_primary_10_1016_j_knosys_2025_113056
crossref_primary_10_1109_TNNLS_2024_3443455
crossref_primary_10_1016_j_bspc_2025_108421
crossref_primary_10_1016_j_eswa_2025_128716
crossref_primary_10_1007_s11042_024_19405_3
crossref_primary_10_1109_JSEN_2025_3566381
crossref_primary_10_1109_JSTARS_2024_3504549
crossref_primary_10_3390_s25010103
crossref_primary_10_1109_LRA_2025_3550707
crossref_primary_10_1038_s41598_025_03567_7
crossref_primary_10_1016_j_patcog_2024_110509
crossref_primary_10_3390_rs17152706
crossref_primary_10_1109_JSEN_2025_3559057
crossref_primary_10_1016_j_asoc_2025_113364
crossref_primary_10_1016_j_imavis_2024_105344
crossref_primary_10_1109_JSEN_2024_3374388
crossref_primary_10_1016_j_inffus_2025_102939
crossref_primary_10_1016_j_knosys_2025_113562
crossref_primary_10_1049_ipr2_13124
crossref_primary_10_1177_14759217251328445
crossref_primary_10_3390_electronics13091770
crossref_primary_10_1049_ell2_70093
crossref_primary_10_1016_j_compeleceng_2025_110133
crossref_primary_10_1109_TGRS_2024_3490752
crossref_primary_10_3390_rs17061057
crossref_primary_10_1007_s11030_025_11194_7
crossref_primary_10_3788_LOP251144
crossref_primary_10_1016_j_compmedimag_2025_102584
crossref_primary_10_1109_TGRS_2024_3452550
crossref_primary_10_3390_app15115857
crossref_primary_10_1016_j_inffus_2025_103414
crossref_primary_10_1016_j_infrared_2025_105824
crossref_primary_10_1109_LGRS_2025_3564181
crossref_primary_10_1007_s11227_025_07587_y
crossref_primary_10_3788_gzxb20255406_0610001
crossref_primary_10_1109_MMUL_2025_3525559
crossref_primary_10_1109_TCSVT_2024_3524645
crossref_primary_10_3390_s25185631
crossref_primary_10_1109_TIM_2025_3580844
crossref_primary_10_1109_TIM_2025_3573003
crossref_primary_10_1016_j_dsp_2025_105490
crossref_primary_10_1088_1361_6501_ae0147
crossref_primary_10_1109_ACCESS_2025_3551947
crossref_primary_10_1007_s10489_025_06485_3
crossref_primary_10_3390_jmse13081528
crossref_primary_10_1109_TGRS_2025_3526190
crossref_primary_10_1016_j_neucom_2025_130505
Cites_doi 10.1109/TIP.2018.2867198
10.1145/3418213
10.1109/CVPR52688.2022.00116
10.1016/j.patcog.2018.08.005
10.1016/j.inffus.2018.11.017
10.1109/CVPRW.2019.00135
10.1109/ICCV.2019.00972
10.1109/CVPR42600.2020.01095
10.1016/j.compeleceng.2022.108385
10.1109/CVPR.2018.00913
10.1016/j.inffus.2018.09.015
10.1109/LRA.2021.3099870
10.1109/CVPR.2018.00745
10.1109/CVPR.2019.00060
10.1109/CVPR52688.2022.00493
10.1016/j.patcog.2022.108786
10.1109/TCSVT.2021.3076466
10.1016/j.jvcir.2015.11.002
10.1109/CVPR42600.2020.01155
10.1109/CVPRW.2017.36
10.5244/C.30.73
10.1016/j.infrared.2021.103770
10.1109/CVPR.2016.90
10.1109/CVPR.2017.106
10.1007/978-3-030-01234-2_1
10.1109/CVPR.2015.7298706
10.1109/ICCV48922.2021.00350
10.1109/ICCV.2019.00523
10.1109/WACV48630.2021.00012
10.1609/aaai.v36i3.20187
10.1109/ICCV48922.2021.00468
10.1016/j.patcog.2022.109071
10.1109/TPAMI.2011.155
10.1109/TCDS.2020.3048883
10.1007/s11432-021-3493-7
10.1016/j.patcog.2022.108998
ContentType Journal Article
Copyright 2023 Elsevier Ltd
Copyright_xml – notice: 2023 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.patcog.2023.109913
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-5142
ExternalDocumentID 10_1016_j_patcog_2023_109913
S0031320323006118
GroupedDBID --K
--M
-D8
-DT
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABFRF
ABHFT
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACBEA
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADMXK
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FD6
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
KZ1
LG9
LMP
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WUQ
XJE
XPP
ZMT
ZY4
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c306t-9164ed5bd569a933e23a1359bb3b9a0d5a156fe6700c46db2d51ea706acc69a53
ISICitedReferencesCount 164
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001122303900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0031-3203
IngestDate Tue Nov 18 21:01:58 EST 2025
Sat Nov 29 07:29:35 EST 2025
Fri Feb 23 02:36:02 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Iterative feature fusion
Transformer
Cross-attention
Multispectral object detection
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-9164ed5bd569a933e23a1359bb3b9a0d5a156fe6700c46db2d51ea706acc69a53
ParticipantIDs crossref_citationtrail_10_1016_j_patcog_2023_109913
crossref_primary_10_1016_j_patcog_2023_109913
elsevier_sciencedirect_doi_10_1016_j_patcog_2023_109913
PublicationCentury 2000
PublicationDate January 2024
2024-01-00
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – month: 01
  year: 2024
  text: January 2024
PublicationDecade 2020
PublicationTitle Pattern recognition
PublicationYear 2024
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References C. Devaguptapu, N. Akolekar, M. M Sharma, V. N Balasubramanian, Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
Fu, Gu, Ai, Li, Wang (b8) 2021; 116
Zuo, Wang, Liu, Shen, Wang (b22) 2022
X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, C.-L. Tai, Transfusion: Robust lidar-camera fusion for 3d object detection with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090–1099.
FLIR ADA Team, [EB/OL]
L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, Z. Liu, Weakly aligned cross-modal learning for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5127–5137.
X. Wei, T. Zhang, Y. Li, Y. Zhang, F. Wu, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941–10950.
Li, Hou, Wang, Gao, Xu, Li (b32) 2021; 14
D. Konig, M. Adam, C. Jarvers, G. Layher, H. Neumann, M. Teutsch, Fully convolutional region proposal networks for multispectral person detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 49–56.
Kieu, Bagdanov, Bertini (b50) 2021; 17
C. Li, D. Song, R. Tong, M. Tang, Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation, in: British Machine Vision Conference, BMVC, 2018.
Bochkovskiy, Wang, Liao (b36) 2020
Li, Song, Tong, Tang (b19) 2019; 85
Qingyun, Zhaokui (b43) 2022
Guan, Cao, Yang, Cao, Yang (b47) 2019; 50
Cheng, Han, Zhou, Xu (b3) 2018; 28
H. Zhang, E. Fromont, S. Lefèvre, B. Avignon, Guided attentive feature fusion for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 72–80.
Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636.
Qingyun, Dapeng, Zhaokui (b10) 2021
Yu, Wang, Chen, Wei (b41) 2014
Bosquet, Cores, Seidenari, Brea, Mucientes, Bimbo (b1) 2023; 133
Raghu, Unterthiner, Kornblith, Zhang, Dosovitskiy (b44) 2021; 34
A. Botach, E. Zheltonozhskii, C. Baskin, End-to-end referring video object segmentation with multimodal transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4985–4995.
Cheng, Lai, Gao, Han (b27) 2023; 66
Kim, Park, Ro (b21) 2021; 32
Liu, Hasan, Liao (b2) 2023; 135
Shen, Liu, Xing (b40) 2022
X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519.
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125.
Shen, Liu, Chen, Zuo, Li, Yang (b18) 2022; 103
J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
Dollar, Wojek, Schiele, Perona (b42) 2011; 34
X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented R-CNN for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3520–3529.
Razakarivony, Jurie (b14) 2016; 34
N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4722–4732.
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542.
Simonyan, Zisserman (b34) 2014
(Accessed 6 July 2021).
S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768.
Lu, Batra, Parikh, Lee (b9) 2019; 32
S. Hwang, J. Park, N. Kim, Y. Choi, I. So Kweon, Multispectral pedestrian detection: Benchmark dataset and baseline, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1037–1045.
Zhou, Chen, Cao (b15) 2020
Kim, Kim, Kim, Kim, Choi (b48) 2021; 6
J. Liu, S. Zhang, S. Wang, D.N. Metaxas, Multispectral deep neural networks for pedestrian detection, in: 27th British Machine Vision Conference, BMVC 2016, 2016.
S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 3–19.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2020.
Zhang, Liu, Zhang, Yang, Qiao, Huang, Hussain (b6) 2019; 50
Zhang, Fromont, Lefevre, Avignon (b16) 2020
Venkataramanan, Ghodrati, Asano, Porikli, Habibian (b45) 2023
Zhang, Lei, Xie, Fang, Li, Du (b51) 2023; 61
Y. Xiao, M. Yang, C. Li, L. Liu, J. Tang, Attribute-based progressive fusion network for rgbt tracking, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, 2022, pp. 2831–2838.
Shen (10.1016/j.patcog.2023.109913_b18) 2022; 103
Yu (10.1016/j.patcog.2023.109913_b41) 2014
10.1016/j.patcog.2023.109913_b12
10.1016/j.patcog.2023.109913_b13
10.1016/j.patcog.2023.109913_b11
10.1016/j.patcog.2023.109913_b49
Zhang (10.1016/j.patcog.2023.109913_b6) 2019; 50
Guan (10.1016/j.patcog.2023.109913_b47) 2019; 50
10.1016/j.patcog.2023.109913_b7
10.1016/j.patcog.2023.109913_b5
Zhou (10.1016/j.patcog.2023.109913_b15) 2020
Venkataramanan (10.1016/j.patcog.2023.109913_b45) 2023
10.1016/j.patcog.2023.109913_b4
10.1016/j.patcog.2023.109913_b20
Li (10.1016/j.patcog.2023.109913_b32) 2021; 14
10.1016/j.patcog.2023.109913_b23
10.1016/j.patcog.2023.109913_b24
10.1016/j.patcog.2023.109913_b17
Bosquet (10.1016/j.patcog.2023.109913_b1) 2023; 133
Li (10.1016/j.patcog.2023.109913_b19) 2019; 85
Kim (10.1016/j.patcog.2023.109913_b21) 2021; 32
Zuo (10.1016/j.patcog.2023.109913_b22) 2022
Dollar (10.1016/j.patcog.2023.109913_b42) 2011; 34
Cheng (10.1016/j.patcog.2023.109913_b27) 2023; 66
Shen (10.1016/j.patcog.2023.109913_b40) 2022
Kim (10.1016/j.patcog.2023.109913_b48) 2021; 6
Fu (10.1016/j.patcog.2023.109913_b8) 2021; 116
10.1016/j.patcog.2023.109913_b30
Zhang (10.1016/j.patcog.2023.109913_b51) 2023; 61
Razakarivony (10.1016/j.patcog.2023.109913_b14) 2016; 34
10.1016/j.patcog.2023.109913_b31
Qingyun (10.1016/j.patcog.2023.109913_b10) 2021
10.1016/j.patcog.2023.109913_b35
Liu (10.1016/j.patcog.2023.109913_b2) 2023; 135
10.1016/j.patcog.2023.109913_b33
Kieu (10.1016/j.patcog.2023.109913_b50) 2021; 17
10.1016/j.patcog.2023.109913_b28
10.1016/j.patcog.2023.109913_b25
10.1016/j.patcog.2023.109913_b26
Qingyun (10.1016/j.patcog.2023.109913_b43) 2022
10.1016/j.patcog.2023.109913_b29
Cheng (10.1016/j.patcog.2023.109913_b3) 2018; 28
Lu (10.1016/j.patcog.2023.109913_b9) 2019; 32
Raghu (10.1016/j.patcog.2023.109913_b44) 2021; 34
Simonyan (10.1016/j.patcog.2023.109913_b34) 2014
Bochkovskiy (10.1016/j.patcog.2023.109913_b36) 2020
10.1016/j.patcog.2023.109913_b46
10.1016/j.patcog.2023.109913_b38
10.1016/j.patcog.2023.109913_b39
10.1016/j.patcog.2023.109913_b37
Zhang (10.1016/j.patcog.2023.109913_b16) 2020
References_xml – reference: A. Botach, E. Zheltonozhskii, C. Baskin, End-to-end referring video object segmentation with multimodal transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4985–4995.
– reference: C. Devaguptapu, N. Akolekar, M. M Sharma, V. N Balasubramanian, Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
– reference: . FLIR ADA Team, [EB/OL]
– reference: Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, Q. Hu, ECA-Net: Efficient channel attention for deep convolutional neural networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11534–11542.
– volume: 32
  year: 2019
  ident: b9
  article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2022
  ident: b43
  article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
  publication-title: Pattern Recognit.
– volume: 135
  year: 2023
  ident: b2
  article-title: Center and scale prediction: Anchor-free approach for pedestrian and face detection
  publication-title: Pattern Recognit.
– year: 2020
  ident: b36
  article-title: Yolov4: Optimal speed and accuracy of object detection
– reference: N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual saliency transformer, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4722–4732.
– reference: S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8759–8768.
– volume: 32
  start-page: 1510
  year: 2021
  end-page: 1523
  ident: b21
  article-title: Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 50
  start-page: 20
  year: 2019
  end-page: 29
  ident: b6
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion
– volume: 17
  start-page: 1
  year: 2021
  end-page: 19
  ident: b50
  article-title: Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images
  publication-title: ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)
– volume: 61
  start-page: 1
  year: 2023
  end-page: 15
  ident: b51
  article-title: SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery
  publication-title: IEEE Trans. Geosci. Remote Sens.
– volume: 103
  year: 2022
  ident: b18
  article-title: Mask-guided explicit feature modulation for multispectral pedestrian detection
  publication-title: Comput. Electr. Eng.
– reference: Y. Xiao, M. Yang, C. Li, L. Liu, J. Tang, Attribute-based progressive fusion network for rgbt tracking, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 3, 2022, pp. 2831–2838.
– volume: 133
  year: 2023
  ident: b1
  article-title: A full data augmentation pipeline for small object detection based on generative adversarial networks
  publication-title: Pattern Recognit.
– start-page: 1
  year: 2022
  end-page: 18
  ident: b22
  article-title: LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention
  publication-title: Neural Process. Lett.
– reference: D. Konig, M. Adam, C. Jarvers, G. Layher, H. Neumann, M. Teutsch, Fully convolutional region proposal networks for multispectral person detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 49–56.
– reference: A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, in: International Conference on Learning Representations, 2020.
– volume: 34
  start-page: 187
  year: 2016
  end-page: 203
  ident: b14
  article-title: Vehicle detection in aerial imagery: A small target detection benchmark
  publication-title: J. Vis. Commun. Image Represent.
– year: 2023
  ident: b45
  article-title: Skip-attention: Improving vision transformers by paying less attention
– reference: J. Liu, S. Zhang, S. Wang, D.N. Metaxas, Multispectral deep neural networks for pedestrian detection, in: 27th British Machine Vision Conference, BMVC 2016, 2016.
– reference: Z. Tian, C. Shen, H. Chen, T. He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636.
– volume: 116
  year: 2021
  ident: b8
  article-title: Adaptive spatial pixel-level feature fusion network for multispectral pedestrian detection
  publication-title: Infrared Phys. Technol.
– reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
– reference: X. Wei, T. Zhang, Y. Li, Y. Zhang, F. Wu, Multi-modality cross attention network for image and sentence matching, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10941–10950.
– volume: 28
  start-page: 265
  year: 2018
  end-page: 278
  ident: b3
  article-title: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection
  publication-title: IEEE Trans. Image Process.
– start-page: 276
  year: 2020
  end-page: 280
  ident: b16
  article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks
  publication-title: 2020 IEEE International Conference on Image Processing
– start-page: 364
  year: 2014
  end-page: 375
  ident: b41
  article-title: Mixed pooling for convolutional neural networks
  publication-title: Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China, October 24-26, 2014, Proceedings 9
– reference: . (Accessed 6 July 2021).
– start-page: 787
  year: 2020
  end-page: 803
  ident: b15
  article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems
  publication-title: European Conference on Computer Vision
– start-page: 727
  year: 2022
  end-page: 744
  ident: b40
  article-title: Sliced recursive transformer
  publication-title: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV
– reference: S. Hwang, J. Park, N. Kim, Y. Choi, I. So Kweon, Multispectral pedestrian detection: Benchmark dataset and baseline, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1037–1045.
– volume: 85
  start-page: 161
  year: 2019
  end-page: 171
  ident: b19
  article-title: Illumination-aware faster R-CNN for robust multispectral pedestrian detection
  publication-title: Pattern Recognit.
– reference: L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, Z. Liu, Weakly aligned cross-modal learning for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5127–5137.
– volume: 34
  start-page: 743
  year: 2011
  end-page: 761
  ident: b42
  article-title: Pedestrian detection: An evaluation of the state of the art
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 34
  start-page: 12116
  year: 2021
  end-page: 12128
  ident: b44
  article-title: Do vision transformers see like convolutional neural networks?
  publication-title: Adv. Neural Inf. Process. Syst.
– reference: J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
– year: 2021
  ident: b10
  article-title: Cross-modality fusion transformer for multispectral object detection
– reference: X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, C.-L. Tai, Transfusion: Robust lidar-camera fusion for 3d object detection with transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1090–1099.
– reference: H. Zhang, E. Fromont, S. Lefèvre, B. Avignon, Guided attentive feature fusion for multispectral pedestrian detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 72–80.
– reference: C. Li, D. Song, R. Tong, M. Tang, Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation, in: British Machine Vision Conference, BMVC, 2018.
– reference: S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 3–19.
– year: 2014
  ident: b34
  article-title: Very deep convolutional networks for large-scale image recognition
– volume: 50
  start-page: 148
  year: 2019
  end-page: 157
  ident: b47
  article-title: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection
  publication-title: Inf. Fusion
– reference: X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519.
– reference: X. Xie, G. Cheng, J. Wang, X. Yao, J. Han, Oriented R-CNN for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3520–3529.
– volume: 66
  year: 2023
  ident: b27
  article-title: Class attention network for image recognition
  publication-title: Sci. China Inf. Sci.
– volume: 14
  start-page: 246
  year: 2021
  end-page: 252
  ident: b32
  article-title: Trear: Transformer-based rgb-d egocentric action recognition
  publication-title: IEEE Trans. Cogn. Dev. Syst.
– reference: T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125.
– volume: 6
  start-page: 7846
  year: 2021
  end-page: 7853
  ident: b48
  article-title: MLPD: Multi-label pedestrian detector in multispectral domain
  publication-title: IEEE Robot. Autom. Lett.
– ident: 10.1016/j.patcog.2023.109913_b11
– volume: 28
  start-page: 265
  issue: 1
  year: 2018
  ident: 10.1016/j.patcog.2023.109913_b3
  article-title: Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2018.2867198
– volume: 17
  start-page: 1
  issue: 1
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b50
  article-title: Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images
  publication-title: ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)
  doi: 10.1145/3418213
– start-page: 1
  year: 2022
  ident: 10.1016/j.patcog.2023.109913_b22
  article-title: LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention
  publication-title: Neural Process. Lett.
– year: 2023
  ident: 10.1016/j.patcog.2023.109913_b45
– ident: 10.1016/j.patcog.2023.109913_b29
  doi: 10.1109/CVPR52688.2022.00116
– volume: 85
  start-page: 161
  year: 2019
  ident: 10.1016/j.patcog.2023.109913_b19
  article-title: Illumination-aware faster R-CNN for robust multispectral pedestrian detection
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2018.08.005
– volume: 50
  start-page: 148
  year: 2019
  ident: 10.1016/j.patcog.2023.109913_b47
  article-title: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2018.11.017
– ident: 10.1016/j.patcog.2023.109913_b49
  doi: 10.1109/CVPRW.2019.00135
– ident: 10.1016/j.patcog.2023.109913_b39
  doi: 10.1109/ICCV.2019.00972
– ident: 10.1016/j.patcog.2023.109913_b28
  doi: 10.1109/CVPR42600.2020.01095
– volume: 32
  year: 2019
  ident: 10.1016/j.patcog.2023.109913_b9
  article-title: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 103
  year: 2022
  ident: 10.1016/j.patcog.2023.109913_b18
  article-title: Mask-guided explicit feature modulation for multispectral pedestrian detection
  publication-title: Comput. Electr. Eng.
  doi: 10.1016/j.compeleceng.2022.108385
– ident: 10.1016/j.patcog.2023.109913_b38
  doi: 10.1109/CVPR.2018.00913
– start-page: 364
  year: 2014
  ident: 10.1016/j.patcog.2023.109913_b41
  article-title: Mixed pooling for convolutional neural networks
– volume: 50
  start-page: 20
  year: 2019
  ident: 10.1016/j.patcog.2023.109913_b6
  article-title: Cross-modality interactive attention network for multispectral pedestrian detection
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2018.09.015
– year: 2014
  ident: 10.1016/j.patcog.2023.109913_b34
– volume: 6
  start-page: 7846
  issue: 4
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b48
  article-title: MLPD: Multi-label pedestrian detector in multispectral domain
  publication-title: IEEE Robot. Autom. Lett.
  doi: 10.1109/LRA.2021.3099870
– volume: 34
  start-page: 12116
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b44
  article-title: Do vision transformers see like convolutional neural networks?
  publication-title: Adv. Neural Inf. Process. Syst.
– ident: 10.1016/j.patcog.2023.109913_b23
  doi: 10.1109/CVPR.2018.00745
– ident: 10.1016/j.patcog.2023.109913_b24
  doi: 10.1109/CVPR.2019.00060
– ident: 10.1016/j.patcog.2023.109913_b30
  doi: 10.1109/CVPR52688.2022.00493
– year: 2022
  ident: 10.1016/j.patcog.2023.109913_b43
  article-title: Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2022.108786
– year: 2021
  ident: 10.1016/j.patcog.2023.109913_b10
– volume: 61
  start-page: 1
  year: 2023
  ident: 10.1016/j.patcog.2023.109913_b51
  article-title: SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery
  publication-title: IEEE Trans. Geosci. Remote Sens.
– volume: 32
  start-page: 1510
  issue: 3
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b21
  article-title: Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3076466
– volume: 34
  start-page: 187
  year: 2016
  ident: 10.1016/j.patcog.2023.109913_b14
  article-title: Vehicle detection in aerial imagery: A small target detection benchmark
  publication-title: J. Vis. Commun. Image Represent.
  doi: 10.1016/j.jvcir.2015.11.002
– start-page: 787
  year: 2020
  ident: 10.1016/j.patcog.2023.109913_b15
  article-title: Improving multispectral pedestrian detection by addressing modality imbalance problems
– ident: 10.1016/j.patcog.2023.109913_b26
  doi: 10.1109/CVPR42600.2020.01155
– ident: 10.1016/j.patcog.2023.109913_b46
  doi: 10.1109/CVPRW.2017.36
– ident: 10.1016/j.patcog.2023.109913_b5
  doi: 10.5244/C.30.73
– volume: 116
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b8
  article-title: Adaptive spatial pixel-level feature fusion network for multispectral pedestrian detection
  publication-title: Infrared Phys. Technol.
  doi: 10.1016/j.infrared.2021.103770
– ident: 10.1016/j.patcog.2023.109913_b35
  doi: 10.1109/CVPR.2016.90
– ident: 10.1016/j.patcog.2023.109913_b37
  doi: 10.1109/CVPR.2017.106
– ident: 10.1016/j.patcog.2023.109913_b17
– ident: 10.1016/j.patcog.2023.109913_b25
  doi: 10.1007/978-3-030-01234-2_1
– start-page: 727
  year: 2022
  ident: 10.1016/j.patcog.2023.109913_b40
  article-title: Sliced recursive transformer
– ident: 10.1016/j.patcog.2023.109913_b13
– ident: 10.1016/j.patcog.2023.109913_b12
  doi: 10.1109/CVPR.2015.7298706
– ident: 10.1016/j.patcog.2023.109913_b4
  doi: 10.1109/ICCV48922.2021.00350
– ident: 10.1016/j.patcog.2023.109913_b20
  doi: 10.1109/ICCV.2019.00523
– ident: 10.1016/j.patcog.2023.109913_b7
  doi: 10.1109/WACV48630.2021.00012
– ident: 10.1016/j.patcog.2023.109913_b33
  doi: 10.1609/aaai.v36i3.20187
– ident: 10.1016/j.patcog.2023.109913_b31
  doi: 10.1109/ICCV48922.2021.00468
– year: 2020
  ident: 10.1016/j.patcog.2023.109913_b36
– volume: 135
  year: 2023
  ident: 10.1016/j.patcog.2023.109913_b2
  article-title: Center and scale prediction: Anchor-free approach for pedestrian and face detection
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2022.109071
– volume: 34
  start-page: 743
  issue: 4
  year: 2011
  ident: 10.1016/j.patcog.2023.109913_b42
  article-title: Pedestrian detection: An evaluation of the state of the art
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2011.155
– volume: 14
  start-page: 246
  issue: 1
  year: 2021
  ident: 10.1016/j.patcog.2023.109913_b32
  article-title: Trear: Transformer-based rgb-d egocentric action recognition
  publication-title: IEEE Trans. Cogn. Dev. Syst.
  doi: 10.1109/TCDS.2020.3048883
– start-page: 276
  year: 2020
  ident: 10.1016/j.patcog.2023.109913_b16
  article-title: Multispectral fusion for object detection with cyclic fuse-and-refine blocks
– volume: 66
  issue: 3
  year: 2023
  ident: 10.1016/j.patcog.2023.109913_b27
  article-title: Class attention network for image recognition
  publication-title: Sci. China Inf. Sci.
  doi: 10.1007/s11432-021-3493-7
– volume: 133
  issn: 0031-3203
  year: 2023
  ident: 10.1016/j.patcog.2023.109913_b1
  article-title: A full data augmentation pipeline for small object detection based on generative adversarial networks
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2022.108998
SSID ssj0017142
Score 2.716372
Snippet Effective feature fusion of multispectral images plays a crucial role in multispectral object detection. Previous studies have demonstrated the effectiveness...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 109913
SubjectTerms Cross-attention
Iterative feature fusion
Multispectral object detection
Transformer
Title ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
URI https://dx.doi.org/10.1016/j.patcog.2023.109913
Volume 145
WOSCitedRecordID wos001122303900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-5142
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017142
  issn: 0031-3203
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ba9swFBZbuoe9rLuyXlb0sLeiYluWZfUtlJamlFJoB9lgGFmSS0pxQhuX_PweXaz0xm6wFxNElBidj3OOpO98B6GvhSp0IeydYSNzkkuZkFprTgoK-Mi1zZmdZP4xPzkpx2NxGi7ab1w7Ad625WIhZv_V1DAGxrals39h7vijMACfwejwBLPD848MP9obHnQ3gbIxcqLJlh3kwiGxapqe33jRTTQkm41xyp7bjZviSIeOY-gqMG3x_rS2JzXb2swdaau9n82eOnFOWxATWEjLO_2zvupj0pgQHB2JwI9-h9FJ5AJNOjfWRYz96NwB7jiogodDiSx_dCgRq2WW1CTnfWlKaJZ4h2a8wy05JZC0PfTIXmHyiXf3Bw2XOzOIUtOLHdv53cphiZQuo1nkGJ55WcqEwiYLkpa0fIlWMs4EuL6V4Wh_fBQvm3iae1H58Hp9haWjAT79r-czmHtZyflb9CZsJ_DQw-AdemHa92i1b9WBg-f-gH5GVOziiAn8CBPYYwIHTGCPCQyYwA8wgT0mcMTER_TtYP9875CEzhpEwRZxDhGuyI1mtWaFkIJSk1GZUibqmtZCJppJ2NY3xpZwqdx2HNMsNZInhVQKZjD6CQ3aaWs-IwwRT8MegGnGVQ75oDQik0mheFKXqmzUGqL9alUqyM7b7idXVc8vvKz8Gld2jSu_xmuIxFkzL7vym-_z3hBVSB19SlgBdn45c_2fZ26g10vob6LB_LozX9ArdQv2uN4KILsDsTiWNw
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ICAFusion%3A+Iterative+cross-attention+guided+feature+fusion+for+multispectral+object+detection&rft.jtitle=Pattern+recognition&rft.au=Shen%2C+Jifeng&rft.au=Chen%2C+Yifei&rft.au=Liu%2C+Yue&rft.au=Zuo%2C+Xin&rft.date=2024-01-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.eissn=1873-5142&rft.volume=145&rft_id=info:doi/10.1016%2Fj.patcog.2023.109913&rft.externalDocID=S0031320323006118
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon