RAFNet: Reparameterizable Across-Resolution Fusion Network for Real-Time Image Semantic Segmentation
The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises...
Saved in:
| Published in: | IEEE transactions on circuits and systems for video technology Vol. 34; no. 2; pp. 1212 - 1227 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.02.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 1051-8215, 1558-2205 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises from the fact that most existing methods take the backbone networks (e.g., ResNet-18 and MobileNet) as an encoder. To alleviate this problem, we propose a novel Reparameterizable Channel & Dilation (RCD) block and construct a considerably lightweight yet effective encoder by stacking several RCD blocks according to three guidelines. The strengths of the proposed encoder result in the abilities not only to extract discriminative feature representations via channel convolutions and dilated convolutions, but also to reduce computational burdens while maintaining segmentation accuracy with the help of re-parameterization technique. Except for encoder, we also present a simple but effective decoder that adopts an across-resolution fusion strategy to fuse multi-scale feature maps generated from the encoder instead of a bottom-up pathway fusion. With such an encoder and a decoder, we provide a Reparameterizable Across-resolution Fusion Network (RAFNet) for real-time semantic segmentation. Extensive experiments demonstrate that our RAFNet achieves a promising trade-off between segmentation accuracy, inference speed and network parameters. Specifically, our RAFNet with only 0.96M parameters obtains 75.3% mIoU at 107 FPS and 75.8% mIoU at 195 FPS on Cityscapes and CamVid test sets for full-resolution inputs, respectively. After quantization and deployment on a Xilinx ZCU104 device, our RAFNet obtains a favorable segmentation performance with only 1.4W power. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1051-8215 1558-2205 |
| DOI: | 10.1109/TCSVT.2023.3293166 |