CEDNet: A cascade encoder–decoder network for dense prediction
The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the m...
Saved in:
| Published in: | Pattern recognition Vol. 158; p. 111072 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.02.2025
|
| Subjects: | |
| ISSN: | 0031-3203 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet.
•We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method. |
|---|---|
| AbstractList | The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet.
•We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method. |
| ArticleNumber | 111072 |
| Author | Li, Ziyi Zhang, Gang Li, Jianmin Hu, Xiaolin Tang, Chufeng |
| Author_xml | – sequence: 1 givenname: Gang orcidid: 0009-0003-4598-8307 surname: Zhang fullname: Zhang, Gang organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China – sequence: 2 givenname: Ziyi surname: Li fullname: Li, Ziyi organization: School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China – sequence: 3 givenname: Chufeng surname: Tang fullname: Tang, Chufeng organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China – sequence: 4 givenname: Jianmin surname: Li fullname: Li, Jianmin organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China – sequence: 5 givenname: Xiaolin orcidid: 0000-0002-4907-7354 surname: Hu fullname: Hu, Xiaolin email: xlhu@mail.tsinghua.edu.cn organization: Department of Computer Science and Technology, Institute for AI, BNRist, THU-Bosch JCML Center, Tsinghua Laboratory of Brain and Intelligence (THBI), IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, 100084, China |
| BookMark | eNqFkE1OwzAQRr0oEm3hBixygYRxnMRNF4iqlB-pgg2sLXc8QS4lrmwLxI47cENOQkJYsYDVjD7pfZo3EzZqXUuMnXDIOPDqdJvtdUT3mOWQFxnnHGQ-YmMAwVORgzhkkxC2AFzyIh-z8-Xq4pbiPFkkqANqQwm16Az5z_cPQ99b0lJ8df4paZxPDLWBkr0nYzFa1x6xg0bvAh3_zCl7uFzdL6_T9d3VzXKxTlFAFVMhN3pGDYeyKvWsy4ypUTeAVU26krmoi7KAUgopSyk3sqZCIpKc6S7FphZTNh960bsQPDUKbdT9BdFru1McVO-vtmrwV72_Gvw7uPgF77191v7tP-xswKgTe7HkVUDbvadz94RRGWf_LvgCCGh67Q |
| CitedBy_id | crossref_primary_10_1080_10589759_2025_2474102 crossref_primary_10_1002_jemt_70039 crossref_primary_10_1371_journal_pone_0321270 crossref_primary_10_3390_rs17050802 crossref_primary_10_1016_j_inffus_2025_103530 crossref_primary_10_3390_agronomy15071508 crossref_primary_10_1109_TSC_2025_3586094 crossref_primary_10_3390_jmse13010044 crossref_primary_10_1016_j_aiia_2025_06_006 crossref_primary_10_1109_ACCESS_2025_3550888 crossref_primary_10_1109_ACCESS_2025_3573719 |
| Cites_doi | 10.1109/CVPR.2017.544 10.1109/CVPR.2019.00720 10.1109/CVPR52688.2022.01181 10.1109/ICCV48922.2021.00986 10.1109/ICCV51070.2023.00355 10.1109/CVPR46437.2021.01008 10.1007/978-3-319-24574-4_28 10.1109/CVPR.2018.00644 10.1109/CVPR.2018.00913 10.1007/978-3-030-01234-2_49 10.1007/978-3-319-10602-1_48 10.1016/j.patcog.2023.109432 10.1007/978-3-030-01228-1_26 10.1016/j.patcog.2020.107404 10.1016/j.patcog.2024.110336 10.1109/CVPR46437.2021.00294 10.1109/CVPR52729.2023.01385 10.1109/CVPR.2009.5206848 10.1007/978-3-319-46484-8_29 10.1109/CVPR52688.2022.01175 |
| ContentType | Journal Article |
| Copyright | 2024 Elsevier Ltd |
| Copyright_xml | – notice: 2024 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.patcog.2024.111072 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| ExternalDocumentID | 10_1016_j_patcog_2024_111072 S0031320324008239 |
| GroupedDBID | --K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXKI AAXUO AAYFN ABBOA ABDPE ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABTAH ABXDB ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFJKZ AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- 9DU AATTM AAYWO AAYXX ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFPUW AGQPQ AIGII AIIUN AKBMS AKYEP ANKPU APXCP CITATION EFKBS EFLBG ~HD |
| ID | FETCH-LOGICAL-c306t-37ba8ef10565a8c30dd9caf0c69ea67239454057377577b79e47cce78a405cf93 |
| ISICitedReferencesCount | 13 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001350104300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0031-3203 |
| IngestDate | Tue Nov 18 20:44:27 EST 2025 Sat Nov 29 03:52:41 EST 2025 Sat Nov 16 15:59:03 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Instance segmentation Cascade encoder–decoder Semantic segmentation Dense prediction Multi-scale feature fusion Object detection |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c306t-37ba8ef10565a8c30dd9caf0c69ea67239454057377577b79e47cce78a405cf93 |
| ORCID | 0009-0003-4598-8307 0000-0002-4907-7354 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_patcog_2024_111072 crossref_primary_10_1016_j_patcog_2024_111072 elsevier_sciencedirect_doi_10_1016_j_patcog_2024_111072 |
| PublicationCentury | 2000 |
| PublicationDate | February 2025 2025-02-00 |
| PublicationDateYYYYMMDD | 2025-02-01 |
| PublicationDate_xml | – month: 02 year: 2025 text: February 2025 |
| PublicationDecade | 2020 |
| PublicationTitle | Pattern recognition |
| PublicationYear | 2025 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Wang, Sun, Cheng, Jiang, Deng, Zhao, Liu, Mu, Tan, Wang, Liu, Xiao (b10) 2021 Xu, Zhang, Zhang, Tao (b38) 2022 Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian, MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12063–12072. Yang, Li, Dai, Gao (b31) 2022 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song, SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020. Yuan, Fu, Huang, Lin, Zhang, Chen, Wang (b37) 2021 Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context, in: Proceedings of the European Conference on Computer Vision, ECCV, 2014, pp. 740–755. Yan, Liu, Xu, Dong, Li, Shi, Zhang, Dai (b19) 2023; 138 Barret Zoph, Quoc V. Le, Neural Architecture Search with Reinforcement Learning, in: International Conference on Learning Representations, ICLR, 2017. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2009, pp. 248–255. Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph, Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 2918–2928. Mingxing Tan, Ruoming Pang, Quoc V. Le, EfficientDet: Scalable and Efficient Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 10781–10790. Hendrycks, Gimpel (b22) 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba, Scene Parsing Through ADE20K Dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 633–641. Yang, Li, Zhang, Dai, Xiao, Yuan, Gao (b35) 2021 Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang, Reversible Column Networks, in: International Conference on Learning Representations, ICLR, 2023. Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun, Unified Perceptual Parsing for Scene Understanding, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018. Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2015, pp. 234–241. Ba, Kiros, Hinton (b21) 2016 Alejandro Newell, Kaiyu Yang, Jia Deng, Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2016, pp. 483–499. Lvmin Zhang, Anyi Rao, Maneesh Agrawala, Adding Conditional Control to Text-to-Image Diffusion Models, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2023, pp. 3836–3847. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, ICLR, 2021. Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia, Path Aggregation Network for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8759–8768. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar, Focal Loss for Dense Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2980–2988. Zhaowei Cai, Nuno Vasconcelos, Cascade R-CNN: Delving Into High Quality Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 6154–6162. Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11976–11986. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 833–851. Qin, Zhang, Huang, Dehghan, Zaiane, Jagersand (b18) 2020; 106 Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al., InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 14408–14419. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 10012–10022. Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Constantin Mocanu, Zhangyang Wang, More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity, in: International Conference on Learning Representations, ICLR, 2023. Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo, CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12124–12134. Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 2117–2125. Liu, Wang, Wang, Liang, Zhao, Tang, Ling (b33) 2020 Cao, Sun, Wang, Geng, Fu, Yin, Pan (b17) 2024; 150 Siyuan Qiao, Liang-Chieh Chen, Alan Yuille, DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 10213–10224. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2961–2969. Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li, GiraffeDet: A Heavy-Neck Paradigm for Object Detection, in: International Conference on Learning Representations, ICLR, 2022. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778. Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le, NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7036–7045. 10.1016/j.patcog.2024.111072_b23 10.1016/j.patcog.2024.111072_b24 Cao (10.1016/j.patcog.2024.111072_b17) 2024; 150 10.1016/j.patcog.2024.111072_b20 10.1016/j.patcog.2024.111072_b40 10.1016/j.patcog.2024.111072_b8 10.1016/j.patcog.2024.111072_b7 10.1016/j.patcog.2024.111072_b6 10.1016/j.patcog.2024.111072_b29 10.1016/j.patcog.2024.111072_b5 10.1016/j.patcog.2024.111072_b27 10.1016/j.patcog.2024.111072_b28 10.1016/j.patcog.2024.111072_b25 10.1016/j.patcog.2024.111072_b9 10.1016/j.patcog.2024.111072_b26 Yang (10.1016/j.patcog.2024.111072_b35) 2021 Hendrycks (10.1016/j.patcog.2024.111072_b22) 2016 Liu (10.1016/j.patcog.2024.111072_b33) 2020 Xu (10.1016/j.patcog.2024.111072_b38) 2022 10.1016/j.patcog.2024.111072_b12 10.1016/j.patcog.2024.111072_b34 10.1016/j.patcog.2024.111072_b13 10.1016/j.patcog.2024.111072_b32 10.1016/j.patcog.2024.111072_b11 Qin (10.1016/j.patcog.2024.111072_b18) 2020; 106 10.1016/j.patcog.2024.111072_b30 Yan (10.1016/j.patcog.2024.111072_b19) 2023; 138 10.1016/j.patcog.2024.111072_b16 10.1016/j.patcog.2024.111072_b39 10.1016/j.patcog.2024.111072_b14 10.1016/j.patcog.2024.111072_b36 10.1016/j.patcog.2024.111072_b15 10.1016/j.patcog.2024.111072_b4 10.1016/j.patcog.2024.111072_b3 Yuan (10.1016/j.patcog.2024.111072_b37) 2021 10.1016/j.patcog.2024.111072_b2 10.1016/j.patcog.2024.111072_b1 Wang (10.1016/j.patcog.2024.111072_b10) 2021 Ba (10.1016/j.patcog.2024.111072_b21) 2016 Yang (10.1016/j.patcog.2024.111072_b31) 2022 |
| References_xml | – reference: Lvmin Zhang, Anyi Rao, Maneesh Agrawala, Adding Conditional Control to Text-to-Image Diffusion Models, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2023, pp. 3836–3847. – reference: Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al., InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2023, pp. 14408–14419. – start-page: 3349 year: 2021 end-page: 3364 ident: b10 article-title: Deep high-resolution representation learning for visual recognition publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: Mingxing Tan, Ruoming Pang, Quoc V. Le, EfficientDet: Scalable and Efficient Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 10781–10790. – reference: Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 833–851. – reference: Barret Zoph, Quoc V. Le, Neural Architecture Search with Reinforcement Learning, in: International Conference on Learning Representations, ICLR, 2017. – reference: Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection, in: International Conference on Learning Representations, ICLR, 2021. – year: 2022 ident: b38 article-title: ViTPose: Simple vision transformer baselines for human pose estimation publication-title: Advances in Neural Information Processing Systems – reference: Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li, GiraffeDet: A Heavy-Neck Paradigm for Object Detection, in: International Conference on Learning Representations, ICLR, 2022. – volume: 106 year: 2020 ident: b18 article-title: U2-net: Going deeper with nested U-structure for salient object detection publication-title: Pattern Recognit. – reference: Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo, CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12124–12134. – reference: Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang, Reversible Column Networks, in: International Conference on Learning Representations, ICLR, 2023. – reference: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2009, pp. 248–255. – reference: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778. – reference: Siyuan Qiao, Liang-Chieh Chen, Alan Yuille, DetectoRS: Detecting Objects With Recursive Feature Pyramid and Switchable Atrous Convolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 10213–10224. – reference: Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Constantin Mocanu, Zhangyang Wang, More ConvNets in the 2020s: Scaling up Kernels Beyond 51 × 51 using Sparsity, in: International Conference on Learning Representations, ICLR, 2023. – reference: Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song, SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020. – reference: Alejandro Newell, Kaiyu Yang, Jia Deng, Stacked Hourglass Networks for Human Pose Estimation, in: Proceedings of the European Conference on Computer Vision, ECCV, 2016, pp. 483–499. – reference: Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba, Scene Parsing Through ADE20K Dataset, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 633–641. – year: 2016 ident: b21 article-title: Layer normalization – start-page: 11653 year: 2020 end-page: 11660 ident: b33 article-title: CBNet: A novel composite backbone network architecture for object detection publication-title: The Association for the Advancement of Artificial Intelligence – reference: Zhaowei Cai, Nuno Vasconcelos, Cascade R-CNN: Delving Into High Quality Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 6154–6162. – reference: Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun, Unified Perceptual Parsing for Scene Understanding, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018. – volume: 138 year: 2023 ident: b19 article-title: 3D medical image segmentation using parallel transformers publication-title: Pattern Recognit. – reference: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar, Focal Loss for Dense Object Detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2980–2988. – year: 2022 ident: b31 article-title: Focal modulation networks publication-title: Advances in Neural Information Processing Systems – reference: Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia, Path Aggregation Network for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2018, pp. 8759–8768. – reference: Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 10012–10022. – volume: 150 year: 2024 ident: b17 article-title: RASNet: Renal automatic segmentation using an improved U-Net with multi-scale perception and attention unit publication-title: Pattern Recognit. – start-page: 30008 year: 2021 end-page: 30022 ident: b35 article-title: Focal self-attention for local-global interactions in vision transformers publication-title: Advances in Neural Information Processing Systems – reference: Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI, 2015, pp. 234–241. – reference: Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature Pyramid Networks for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2017, pp. 2117–2125. – reference: Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph, Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 2918–2928. – reference: Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, A ConvNet for the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 11976–11986. – year: 2016 ident: b22 article-title: Gaussian error linear units (GELUs) – year: 2021 ident: b37 article-title: HRFormer: High-resolution transformer for dense prediction publication-title: NeurIPS – reference: Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le, NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7036–7045. – reference: Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian, MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2022, pp. 12063–12072. – reference: Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick, Mask R-CNN, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV, 2017, pp. 2961–2969. – reference: Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context, in: Proceedings of the European Conference on Computer Vision, ECCV, 2014, pp. 740–755. – ident: 10.1016/j.patcog.2024.111072_b36 doi: 10.1109/CVPR.2017.544 – ident: 10.1016/j.patcog.2024.111072_b7 doi: 10.1109/CVPR.2019.00720 – year: 2016 ident: 10.1016/j.patcog.2024.111072_b21 – ident: 10.1016/j.patcog.2024.111072_b39 doi: 10.1109/CVPR52688.2022.01181 – ident: 10.1016/j.patcog.2024.111072_b2 doi: 10.1109/ICCV48922.2021.00986 – ident: 10.1016/j.patcog.2024.111072_b23 doi: 10.1109/ICCV51070.2023.00355 – start-page: 3349 year: 2021 ident: 10.1016/j.patcog.2024.111072_b10 article-title: Deep high-resolution representation learning for visual recognition – ident: 10.1016/j.patcog.2024.111072_b12 – ident: 10.1016/j.patcog.2024.111072_b30 doi: 10.1109/CVPR46437.2021.01008 – year: 2022 ident: 10.1016/j.patcog.2024.111072_b31 article-title: Focal modulation networks – ident: 10.1016/j.patcog.2024.111072_b1 – start-page: 11653 year: 2020 ident: 10.1016/j.patcog.2024.111072_b33 article-title: CBNet: A novel composite backbone network architecture for object detection – year: 2016 ident: 10.1016/j.patcog.2024.111072_b22 – ident: 10.1016/j.patcog.2024.111072_b15 doi: 10.1007/978-3-319-24574-4_28 – ident: 10.1016/j.patcog.2024.111072_b29 doi: 10.1109/CVPR.2018.00644 – ident: 10.1016/j.patcog.2024.111072_b5 – ident: 10.1016/j.patcog.2024.111072_b6 doi: 10.1109/CVPR.2018.00913 – ident: 10.1016/j.patcog.2024.111072_b3 – ident: 10.1016/j.patcog.2024.111072_b16 doi: 10.1007/978-3-030-01234-2_49 – ident: 10.1016/j.patcog.2024.111072_b9 – ident: 10.1016/j.patcog.2024.111072_b11 – year: 2022 ident: 10.1016/j.patcog.2024.111072_b38 article-title: ViTPose: Simple vision transformer baselines for human pose estimation – ident: 10.1016/j.patcog.2024.111072_b25 doi: 10.1007/978-3-319-10602-1_48 – volume: 138 year: 2023 ident: 10.1016/j.patcog.2024.111072_b19 article-title: 3D medical image segmentation using parallel transformers publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2023.109432 – year: 2021 ident: 10.1016/j.patcog.2024.111072_b37 article-title: HRFormer: High-resolution transformer for dense prediction – ident: 10.1016/j.patcog.2024.111072_b13 – ident: 10.1016/j.patcog.2024.111072_b4 doi: 10.1007/978-3-030-01228-1_26 – ident: 10.1016/j.patcog.2024.111072_b34 – ident: 10.1016/j.patcog.2024.111072_b20 – volume: 106 year: 2020 ident: 10.1016/j.patcog.2024.111072_b18 article-title: U2-net: Going deeper with nested U-structure for salient object detection publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2020.107404 – volume: 150 year: 2024 ident: 10.1016/j.patcog.2024.111072_b17 article-title: RASNet: Renal automatic segmentation using an improved U-Net with multi-scale perception and attention unit publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2024.110336 – ident: 10.1016/j.patcog.2024.111072_b27 doi: 10.1109/CVPR46437.2021.00294 – ident: 10.1016/j.patcog.2024.111072_b32 doi: 10.1109/CVPR52729.2023.01385 – ident: 10.1016/j.patcog.2024.111072_b26 – ident: 10.1016/j.patcog.2024.111072_b24 doi: 10.1109/CVPR.2009.5206848 – start-page: 30008 year: 2021 ident: 10.1016/j.patcog.2024.111072_b35 article-title: Focal self-attention for local-global interactions in vision transformers – ident: 10.1016/j.patcog.2024.111072_b8 – ident: 10.1016/j.patcog.2024.111072_b14 doi: 10.1007/978-3-319-46484-8_29 – ident: 10.1016/j.patcog.2024.111072_b28 – ident: 10.1016/j.patcog.2024.111072_b40 doi: 10.1109/CVPR52688.2022.01175 |
| SSID | ssj0017142 |
| Score | 2.5251136 |
| Snippet | The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 111072 |
| SubjectTerms | Cascade encoder–decoder Dense prediction Instance segmentation Multi-scale feature fusion Object detection Semantic segmentation |
| Title | CEDNet: A cascade encoder–decoder network for dense prediction |
| URI | https://dx.doi.org/10.1016/j.patcog.2024.111072 |
| Volume | 158 |
| WOSCitedRecordID | wos001350104300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection - Elsevier issn: 0031-3203 databaseCode: AIEXJ dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0017142 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1BT9swFLZG4cAF2AYaMCYfuFVBbZzEMSeqrmzjUPXApGqXyHEdaMXSqi0IbvwH_iG_hGf7JQFWwThwiRrLca28L8_v2d97j5B9Bja7HyhwclQj9YIgFF7KQuVpqTPeVDwMWGaLTfBuN-73RQ9DCGa2nADP8_j6WkzeVdTQBsI2obNvEHc5KDTAbxA6XEHscP0vwbc737vabfjVlZwZ_nvdZKs0mUKQ2cAG2t7Xc0cCt1xDUEAzEzVlTm5KaaHZ2rNZOE3kC9KNqsP7csf5h8RF0NB7LEfgz_BmWO0MuF7t88tMP-94Ahj9iynAcQfCDwvScqVVWdNjfoM90aouIzvqxaZxM_2FKtvtHowOJrD0jM_AY_eDg6r70wzZz1aukk9YUNVGiRslMaMkbpQlsuzzUMQ1stz61emflGdMvBm4XPI4-yKw0rL__p3NYsPlkTFyukHW0IugLSf9j-SDzj-R9aJCB0WF_ZkcOTAc0hZFKFCEwv3tHYKAIggogIBaENAKBJvk93HntP3Tw5oZngLnbw7rRSpjnYHVHIUyhrbBQCiZNVQktIy4z0RgbXTGech5yoUOuFKaxxJaVSbYFqnl41x_ITSK4lCyhkhN9DFYkSkHfc8kfMIw20EabRNWvJBEYUJ5U9fkInlJHNvEK5-auIQqr_TnxbtO0Ch0xl4CAHrxyZ03_tMuWa3Q_ZXU5tNLvUdW1NV8OJt-Q_Q8AJKtgB0 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CEDNet%3A+A+cascade+encoder%E2%80%93decoder+network+for+dense+prediction&rft.jtitle=Pattern+recognition&rft.au=Zhang%2C+Gang&rft.au=Li%2C+Ziyi&rft.au=Tang%2C+Chufeng&rft.au=Li%2C+Jianmin&rft.date=2025-02-01&rft.issn=0031-3203&rft.volume=158&rft.spage=111072&rft_id=info:doi/10.1016%2Fj.patcog.2024.111072&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patcog_2024_111072 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon |