CEDNet: A cascade encoder–decoder network for dense prediction
The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the m...
Uloženo v:
| Vydáno v: | Pattern recognition Ročník 158; s. 111072 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.02.2025
|
| Témata: | |
| ISSN: | 0031-3203 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.11Code: https://github.com/zhanggang001/CEDNet.
•We propose CEDNet, a cascade encoder–decoder network for dense prediction. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.•We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN. They all performed much better than traditional methods that employ a pre-designed classification backbone combined with a lightweight multi-scale feature fusion module.•We conducted extensive experiments on object detection, instance segmentation, and semantic segmentation. The excellent performance we achieved on these tasks demonstrates the effectiveness of our method. |
|---|---|
| ISSN: | 0031-3203 |
| DOI: | 10.1016/j.patcog.2024.111072 |