CEDNet: A cascade encoder–decoder network for dense prediction

The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the m...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition Vol. 158; p. 111072
Main Authors: Zhang, Gang, Li, Ziyi, Tang, Chufeng, Li, Jianmin, Hu, Xiaolin
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.02.2025
Subjects:
ISSN:0031-3203
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet. •We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.111072