CEDNet: A cascade encoder–decoder network for dense prediction

The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Pattern recognition Ročník 158; s. 111072
Hlavní autoři: Zhang, Gang, Li, Ziyi, Tang, Chufeng, Li, Jianmin, Hu, Xiaolin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.02.2025
Témata:
ISSN:0031-3203
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet. •We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.111072