CEDNet: A cascade encoder–decoder network for dense prediction

The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the m...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition Vol. 158; p. 111072
Main Authors: Zhang, Gang, Li, Ziyi, Tang, Chufeng, Li, Jianmin, Hu, Xiaolin
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.02.2025
Subjects:
ISSN:0031-3203
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet. •We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method.
AbstractList The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet. •We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method.
ArticleNumber 111072
Author Li, Ziyi
Zhang, Gang
Li, Jianmin
Hu, Xiaolin
Tang, Chufeng
Author_xml – sequence: 1
  givenname: Gang
  orcidid: 0009-0003-4598-8307
  surname: Zhang
  fullname: Zhang, Gang
  organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China
– sequence: 2
  givenname: Ziyi
  surname: Li
  fullname: Li, Ziyi
  organization: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
– sequence: 3
  givenname: Chufeng
  surname: Tang
  fullname: Tang, Chufeng
  organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China
– sequence: 4
  givenname: Jianmin
  surname: Li
  fullname: Li, Jianmin
  organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China
– sequence: 5
  givenname: Xiaolin
  orcidid: 0000-0002-4907-7354
  surname: Hu
  fullname: Hu, Xiaolin
  email: xlhu@mail.tsinghua.edu.cn
  organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China
BookMark eNqFkE1OwzAQRr0oEm3hBixygYRxnMRNF4iqlB-pgg2sLXc8QS4lrmwLxI47cENOQkJYsYDVjD7pfZo3EzZqXUuMnXDIOPDqdJvtdUT3mOWQFxnnHGQ-YmMAwVORgzhkkxC2AFzyIh-z8-Xq4pbiPFkkqANqQwm16Az5z_cPQ99b0lJ8df4paZxPDLWBkr0nYzFa1x6xg0bvAh3_zCl7uFzdL6_T9d3VzXKxTlFAFVMhN3pGDYeyKvWsy4ypUTeAVU26krmoi7KAUgopSyk3sqZCIpKc6S7FphZTNh960bsQPDUKbdT9BdFru1McVO-vtmrwV72_Gvw7uPgF77191v7tP-xswKgTe7HkVUDbvadz94RRGWf_LvgCCGh67Q
CitedBy_id crossref_primary_10_1080_10589759_2025_2474102
crossref_primary_10_1002_jemt_70039
crossref_primary_10_1371_journal_pone_0321270
crossref_primary_10_3390_rs17050802
crossref_primary_10_1016_j_inffus_2025_103530
crossref_primary_10_3390_agronomy15071508
crossref_primary_10_1109_TSC_2025_3586094
crossref_primary_10_3390_jmse13010044
crossref_primary_10_1016_j_aiia_2025_06_006
crossref_primary_10_1109_ACCESS_2025_3550888
crossref_primary_10_1109_ACCESS_2025_3573719
Cites_doi 10.1109/CVPR.2017.544
10.1109/CVPR.2019.00720
10.1109/CVPR52688.2022.01181
10.1109/ICCV48922.2021.00986
10.1109/ICCV51070.2023.00355
10.1109/CVPR46437.2021.01008
10.1007/978-3-319-24574-4_28
10.1109/CVPR.2018.00644
10.1109/CVPR.2018.00913
10.1007/978-3-030-01234-2_49
10.1007/978-3-319-10602-1_48
10.1016/j.patcog.2023.109432
10.1007/978-3-030-01228-1_26
10.1016/j.patcog.2020.107404
10.1016/j.patcog.2024.110336
10.1109/CVPR46437.2021.00294
10.1109/CVPR52729.2023.01385
10.1109/CVPR.2009.5206848
10.1007/978-3-319-46484-8_29
10.1109/CVPR52688.2022.01175
ContentType Journal Article
Copyright 2024 Elsevier Ltd
Copyright_xml – notice: 2024 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.patcog.2024.111072
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_patcog_2024_111072
S0031320324008239
GroupedDBID --K
--M
-D8
-DT
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXKI
AAXUO
AAYFN
ABBOA
ABDPE
ABEFU
ABFNM
ABFRF
ABHFT
ABJNI
ABMAC
ABTAH
ABXDB
ACBEA
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADMXK
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFJKZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FD6
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
KZ1
LG9
LMP
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WUQ
XJE
XPP
ZMT
ZY4
~G-
9DU
AATTM
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
CITATION
EFKBS
EFLBG
~HD
ID FETCH-LOGICAL-c306t-37ba8ef10565a8c30dd9caf0c69ea67239454057377577b79e47cce78a405cf93
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001350104300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0031-3203
IngestDate Tue Nov 18 20:44:27 EST 2025
Sat Nov 29 03:52:41 EST 2025
Sat Nov 16 15:59:03 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Instance segmentation
Cascade encoder–decoder
Semantic segmentation
Dense prediction
Multi-scale feature fusion
Object detection
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-37ba8ef10565a8c30dd9caf0c69ea67239454057377577b79e47cce78a405cf93
ORCID 0009-0003-4598-8307
0000-0002-4907-7354
ParticipantIDs crossref_citationtrail_10_1016_j_patcog_2024_111072
crossref_primary_10_1016_j_patcog_2024_111072
elsevier_sciencedirect_doi_10_1016_j_patcog_2024_111072
PublicationCentury 2000
PublicationDate February 2025
2025-02-00
PublicationDateYYYYMMDD 2025-02-01
PublicationDate_xml – month: 02
  year: 2025
  text: February 2025
PublicationDecade 2020
PublicationTitle Pattern recognition
PublicationYear 2025
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Wang, Sun, Cheng, Jiang, Deng, Zhao, Liu, Mu, Tan, Wang, Liu, Xiao (b10) 2021
Xu, Zhang, Zhang, Tao (b38) 2022
Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian, MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12063–12072.
Yang, Li, Dai, Gao (b31) 2022
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song, SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
Yuan, Fu, Huang, Lin, Zhang, Chen, Wang (b37) 2021
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context, in: Proceedings of the European Conference on Computer Vision, ECCV, 2014, pp. 740–755.
Yan, Liu, Xu, Dong, Li, Shi, Zhang, Dai (b19) 2023; 138
Barret Zoph, Quoc V. Le, Neural Architecture Search with Reinforcement Learning, in: International Conference on Learning Representations, ICLR, 2017.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2009, pp. 248–255.
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph, Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 2918–2928.
Mingxing Tan, Ruoming Pang, Quoc V. Le, EfficientDet: Scalable and Efficient Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 10781–10790.
Hendrycks, Gimpel (b22) 2016
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba, Scene Parsing Through ADE20K Dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 633–641.
Yang, Li, Zhang, Dai, Xiao, Yuan, Gao (b35) 2021
Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang, Reversible Column Networks, in: International Conference on Learning Representations, ICLR, 2023.
Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun, Unified Perceptual Parsing for Scene Understanding, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018.
Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2015, pp. 234–241.
Ba, Kiros, Hinton (b21) 2016
Alejandro Newell, Kaiyu Yang, Jia Deng, Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2016, pp. 483–499.
Lvmin Zhang, Anyi Rao, Maneesh Agrawala, Adding Conditional Control to Text-to-Image Diffusion Models, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2023, pp. 3836–3847.
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, ICLR, 2021.
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia, Path Aggregation Network for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8759–8768.
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar, Focal Loss for Dense Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2980–2988.
Zhaowei Cai, Nuno Vasconcelos, Cascade R-CNN: Delving Into High Quality Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 6154–6162.
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11976–11986.
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 833–851.
Qin, Zhang, Huang, Dehghan, Zaiane, Jagersand (b18) 2020; 106
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al., InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 14408–14419.
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 10012–10022.
Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Constantin Mocanu, Zhangyang Wang, More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity, in: International Conference on Learning Representations, ICLR, 2023.
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo, CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12124–12134.
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 2117–2125.
Liu, Wang, Wang, Liang, Zhao, Tang, Ling (b33) 2020
Cao, Sun, Wang, Geng, Fu, Yin, Pan (b17) 2024; 150
Siyuan Qiao, Liang-Chieh Chen, Alan Yuille, DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 10213–10224.
Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2961–2969.
Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li, GiraffeDet: A Heavy-Neck Paradigm for Object Detection, in: International Conference on Learning Representations, ICLR, 2022.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le, NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7036–7045.
10.1016/j.patcog.2024.111072_b23
10.1016/j.patcog.2024.111072_b24
Cao (10.1016/j.patcog.2024.111072_b17) 2024; 150
10.1016/j.patcog.2024.111072_b20
10.1016/j.patcog.2024.111072_b40
10.1016/j.patcog.2024.111072_b8
10.1016/j.patcog.2024.111072_b7
10.1016/j.patcog.2024.111072_b6
10.1016/j.patcog.2024.111072_b29
10.1016/j.patcog.2024.111072_b5
10.1016/j.patcog.2024.111072_b27
10.1016/j.patcog.2024.111072_b28
10.1016/j.patcog.2024.111072_b25
10.1016/j.patcog.2024.111072_b9
10.1016/j.patcog.2024.111072_b26
Yang (10.1016/j.patcog.2024.111072_b35) 2021
Hendrycks (10.1016/j.patcog.2024.111072_b22) 2016
Liu (10.1016/j.patcog.2024.111072_b33) 2020
Xu (10.1016/j.patcog.2024.111072_b38) 2022
10.1016/j.patcog.2024.111072_b12
10.1016/j.patcog.2024.111072_b34
10.1016/j.patcog.2024.111072_b13
10.1016/j.patcog.2024.111072_b32
10.1016/j.patcog.2024.111072_b11
Qin (10.1016/j.patcog.2024.111072_b18) 2020; 106
10.1016/j.patcog.2024.111072_b30
Yan (10.1016/j.patcog.2024.111072_b19) 2023; 138
10.1016/j.patcog.2024.111072_b16
10.1016/j.patcog.2024.111072_b39
10.1016/j.patcog.2024.111072_b14
10.1016/j.patcog.2024.111072_b36
10.1016/j.patcog.2024.111072_b15
10.1016/j.patcog.2024.111072_b4
10.1016/j.patcog.2024.111072_b3
Yuan (10.1016/j.patcog.2024.111072_b37) 2021
10.1016/j.patcog.2024.111072_b2
10.1016/j.patcog.2024.111072_b1
Wang (10.1016/j.patcog.2024.111072_b10) 2021
Ba (10.1016/j.patcog.2024.111072_b21) 2016
Yang (10.1016/j.patcog.2024.111072_b31) 2022
References_xml – reference: Lvmin Zhang, Anyi Rao, Maneesh Agrawala, Adding Conditional Control to Text-to-Image Diffusion Models, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2023, pp. 3836–3847.
– reference: Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al., InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 14408–14419.
– start-page: 3349
  year: 2021
  end-page: 3364
  ident: b10
  article-title: Deep high-resolution representation learning for visual recognition
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: Mingxing Tan, Ruoming Pang, Quoc V. Le, EfficientDet: Scalable and Efficient Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 10781–10790.
– reference: Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 833–851.
– reference: Barret Zoph, Quoc V. Le, Neural Architecture Search with Reinforcement Learning, in: International Conference on Learning Representations, ICLR, 2017.
– reference: Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, ICLR, 2021.
– year: 2022
  ident: b38
  article-title: ViTPose: Simple vision transformer baselines for human pose estimation
  publication-title: Advances in Neural Information Processing Systems
– reference: Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li, GiraffeDet: A Heavy-Neck Paradigm for Object Detection, in: International Conference on Learning Representations, ICLR, 2022.
– volume: 106
  year: 2020
  ident: b18
  article-title: U2-net: Going deeper with nested U-structure for salient object detection
  publication-title: Pattern Recognit.
– reference: Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo, CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12124–12134.
– reference: Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang, Reversible Column Networks, in: International Conference on Learning Representations, ICLR, 2023.
– reference: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2009, pp. 248–255.
– reference: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
– reference: Siyuan Qiao, Liang-Chieh Chen, Alan Yuille, DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 10213–10224.
– reference: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Constantin Mocanu, Zhangyang Wang, More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity, in: International Conference on Learning Representations, ICLR, 2023.
– reference: Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song, SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020.
– reference: Alejandro Newell, Kaiyu Yang, Jia Deng, Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2016, pp. 483–499.
– reference: Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba, Scene Parsing Through ADE20K Dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 633–641.
– year: 2016
  ident: b21
  article-title: Layer normalization
– start-page: 11653
  year: 2020
  end-page: 11660
  ident: b33
  article-title: CBNet: A novel composite backbone network architecture for object detection
  publication-title: The Association for the Advancement of Artificial Intelligence
– reference: Zhaowei Cai, Nuno Vasconcelos, Cascade R-CNN: Delving Into High Quality Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 6154–6162.
– reference: Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun, Unified Perceptual Parsing for Scene Understanding, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018.
– volume: 138
  year: 2023
  ident: b19
  article-title: 3D medical image segmentation using parallel transformers
  publication-title: Pattern Recognit.
– reference: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar, Focal Loss for Dense Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2980–2988.
– year: 2022
  ident: b31
  article-title: Focal modulation networks
  publication-title: Advances in Neural Information Processing Systems
– reference: Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia, Path Aggregation Network for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8759–8768.
– reference: Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 10012–10022.
– volume: 150
  year: 2024
  ident: b17
  article-title: RASNet: Renal automatic segmentation using an improved U-Net with multi-scale perception and attention unit
  publication-title: Pattern Recognit.
– start-page: 30008
  year: 2021
  end-page: 30022
  ident: b35
  article-title: Focal self-attention for local-global interactions in vision transformers
  publication-title: Advances in Neural Information Processing Systems
– reference: Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2015, pp. 234–241.
– reference: Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 2117–2125.
– reference: Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph, Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 2918–2928.
– reference: Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11976–11986.
– year: 2016
  ident: b22
  article-title: Gaussian error linear units (GELUs)
– year: 2021
  ident: b37
  article-title: HRFormer: High-resolution transformer for dense prediction
  publication-title: NeurIPS
– reference: Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le, NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7036–7045.
– reference: Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian, MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12063–12072.
– reference: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2961–2969.
– reference: Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context, in: Proceedings of the European Conference on Computer Vision, ECCV, 2014, pp. 740–755.
– ident: 10.1016/j.patcog.2024.111072_b36
  doi: 10.1109/CVPR.2017.544
– ident: 10.1016/j.patcog.2024.111072_b7
  doi: 10.1109/CVPR.2019.00720
– year: 2016
  ident: 10.1016/j.patcog.2024.111072_b21
– ident: 10.1016/j.patcog.2024.111072_b39
  doi: 10.1109/CVPR52688.2022.01181
– ident: 10.1016/j.patcog.2024.111072_b2
  doi: 10.1109/ICCV48922.2021.00986
– ident: 10.1016/j.patcog.2024.111072_b23
  doi: 10.1109/ICCV51070.2023.00355
– start-page: 3349
  year: 2021
  ident: 10.1016/j.patcog.2024.111072_b10
  article-title: Deep high-resolution representation learning for visual recognition
– ident: 10.1016/j.patcog.2024.111072_b12
– ident: 10.1016/j.patcog.2024.111072_b30
  doi: 10.1109/CVPR46437.2021.01008
– year: 2022
  ident: 10.1016/j.patcog.2024.111072_b31
  article-title: Focal modulation networks
– ident: 10.1016/j.patcog.2024.111072_b1
– start-page: 11653
  year: 2020
  ident: 10.1016/j.patcog.2024.111072_b33
  article-title: CBNet: A novel composite backbone network architecture for object detection
– year: 2016
  ident: 10.1016/j.patcog.2024.111072_b22
– ident: 10.1016/j.patcog.2024.111072_b15
  doi: 10.1007/978-3-319-24574-4_28
– ident: 10.1016/j.patcog.2024.111072_b29
  doi: 10.1109/CVPR.2018.00644
– ident: 10.1016/j.patcog.2024.111072_b5
– ident: 10.1016/j.patcog.2024.111072_b6
  doi: 10.1109/CVPR.2018.00913
– ident: 10.1016/j.patcog.2024.111072_b3
– ident: 10.1016/j.patcog.2024.111072_b16
  doi: 10.1007/978-3-030-01234-2_49
– ident: 10.1016/j.patcog.2024.111072_b9
– ident: 10.1016/j.patcog.2024.111072_b11
– year: 2022
  ident: 10.1016/j.patcog.2024.111072_b38
  article-title: ViTPose: Simple vision transformer baselines for human pose estimation
– ident: 10.1016/j.patcog.2024.111072_b25
  doi: 10.1007/978-3-319-10602-1_48
– volume: 138
  year: 2023
  ident: 10.1016/j.patcog.2024.111072_b19
  article-title: 3D medical image segmentation using parallel transformers
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2023.109432
– year: 2021
  ident: 10.1016/j.patcog.2024.111072_b37
  article-title: HRFormer: High-resolution transformer for dense prediction
– ident: 10.1016/j.patcog.2024.111072_b13
– ident: 10.1016/j.patcog.2024.111072_b4
  doi: 10.1007/978-3-030-01228-1_26
– ident: 10.1016/j.patcog.2024.111072_b34
– ident: 10.1016/j.patcog.2024.111072_b20
– volume: 106
  year: 2020
  ident: 10.1016/j.patcog.2024.111072_b18
  article-title: U2-net: Going deeper with nested U-structure for salient object detection
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2020.107404
– volume: 150
  year: 2024
  ident: 10.1016/j.patcog.2024.111072_b17
  article-title: RASNet: Renal automatic segmentation using an improved U-Net with multi-scale perception and attention unit
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2024.110336
– ident: 10.1016/j.patcog.2024.111072_b27
  doi: 10.1109/CVPR46437.2021.00294
– ident: 10.1016/j.patcog.2024.111072_b32
  doi: 10.1109/CVPR52729.2023.01385
– ident: 10.1016/j.patcog.2024.111072_b26
– ident: 10.1016/j.patcog.2024.111072_b24
  doi: 10.1109/CVPR.2009.5206848
– start-page: 30008
  year: 2021
  ident: 10.1016/j.patcog.2024.111072_b35
  article-title: Focal self-attention for local-global interactions in vision transformers
– ident: 10.1016/j.patcog.2024.111072_b8
– ident: 10.1016/j.patcog.2024.111072_b14
  doi: 10.1007/978-3-319-46484-8_29
– ident: 10.1016/j.patcog.2024.111072_b28
– ident: 10.1016/j.patcog.2024.111072_b40
  doi: 10.1109/CVPR52688.2022.01175
SSID ssj0017142
Score 2.5250442
Snippet The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 111072
SubjectTerms Cascade encoder–decoder
Dense prediction
Instance segmentation
Multi-scale feature fusion
Object detection
Semantic segmentation
Title CEDNet: A cascade encoder–decoder network for dense prediction
URI https://dx.doi.org/10.1016/j.patcog.2024.111072
Volume 158
WOSCitedRecordID wos001350104300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0031-3203
  databaseCode: AIEXJ
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0017142
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LbxMxELYg5cCF8hSlgHzgFi1yd521zYkohEcPUSWKFHFZee1Zmgq2UZOicuM_9B_2lzBez-5Ci3hJXFaJZWej-azxePz5G8aeGG-FhEok0kOVyNKXiXZh45qCkr702shYbELNZno-N3tUVnHVlBNQda1PT83yv0KNbQh2uDr7F3B3P4oN-BlBxyfCjs8_An4yfTGDmPAbOrsK_PdhUKsMSiHEbMg8NN-HdSSBN1xDdECrcGsqnNx0aFHYuteocIabL0Q36g_vu4zzK0uLYKD3NByB94sviz4zEHtNDk4quNhxF-foJ5IApwxEOmpJy21arL0a0_OQGleb7SRZKrIfXG2Uab_ktmMG4fDpEpefow-4a09l8OUiVvW5IIj9NupNiiAlGM4JzVW2kaqR0QO2MX4zne92p0hqR0a1ePor7dXJht93-V0_D02-Czf2b7IbtE_g44jvLXYF6ttss63Bwckl32HPI9zP-JgT2JzAPv96RjBzgpkjzLyBmfcw32XvXk73J68TqoqRONzerXFFKK2GCuPifGQ1tnlvnK2Eyw3YXIVS900Unik1UqpUBqRyDpS22Ooqk91jg_qohvuMe21BeKkqyHMJxmoBDoRxqc9tmuWwxbLWIIUjyfhQueRj0XIDD4toxiKYsYhm3GJJN2oZJVN-01-1ti4o7IvhXIHT45cjH_zzyG12vZ_JD9lgfXwCj9g193m9WB0_pnn0DeFVfDE
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CEDNet%3A+A+cascade+encoder%E2%80%93decoder+network+for+dense+prediction&rft.jtitle=Pattern+recognition&rft.au=Zhang%2C+Gang&rft.au=Li%2C+Ziyi&rft.au=Tang%2C+Chufeng&rft.au=Li%2C+Jianmin&rft.date=2025-02-01&rft.pub=Elsevier+Ltd&rft.issn=0031-3203&rft.volume=158&rft_id=info:doi/10.1016%2Fj.patcog.2024.111072&rft.externalDocID=S0031320324008239
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon