RAFNet: Reparameterizable Across-Resolution Fusion Network for Real-Time Image Semantic Segmentation

The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on circuits and systems for video technology Ročník 34; číslo 2; s. 1212 - 1227
Hlavní autoři: Chen, Lei, Dai, Huhe, Zheng, Yuan
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.02.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1051-8215, 1558-2205
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises from the fact that most existing methods take the backbone networks (e.g., ResNet-18 and MobileNet) as an encoder. To alleviate this problem, we propose a novel Reparameterizable Channel & Dilation (RCD) block and construct a considerably lightweight yet effective encoder by stacking several RCD blocks according to three guidelines. The strengths of the proposed encoder result in the abilities not only to extract discriminative feature representations via channel convolutions and dilated convolutions, but also to reduce computational burdens while maintaining segmentation accuracy with the help of re-parameterization technique. Except for encoder, we also present a simple but effective decoder that adopts an across-resolution fusion strategy to fuse multi-scale feature maps generated from the encoder instead of a bottom-up pathway fusion. With such an encoder and a decoder, we provide a Reparameterizable Across-resolution Fusion Network (RAFNet) for real-time semantic segmentation. Extensive experiments demonstrate that our RAFNet achieves a promising trade-off between segmentation accuracy, inference speed and network parameters. Specifically, our RAFNet with only 0.96M parameters obtains 75.3% mIoU at 107 FPS and 75.8% mIoU at 195 FPS on Cityscapes and CamVid test sets for full-resolution inputs, respectively. After quantization and deployment on a Xilinx ZCU104 device, our RAFNet obtains a favorable segmentation performance with only 1.4W power.
AbstractList The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises from the fact that most existing methods take the backbone networks (e.g., ResNet-18 and MobileNet) as an encoder. To alleviate this problem, we propose a novel Reparameterizable Channel & Dilation (RCD) block and construct a considerably lightweight yet effective encoder by stacking several RCD blocks according to three guidelines. The strengths of the proposed encoder result in the abilities not only to extract discriminative feature representations via channel convolutions and dilated convolutions, but also to reduce computational burdens while maintaining segmentation accuracy with the help of re-parameterization technique. Except for encoder, we also present a simple but effective decoder that adopts an across-resolution fusion strategy to fuse multi-scale feature maps generated from the encoder instead of a bottom-up pathway fusion. With such an encoder and a decoder, we provide a Reparameterizable Across-resolution Fusion Network (RAFNet) for real-time semantic segmentation. Extensive experiments demonstrate that our RAFNet achieves a promising trade-off between segmentation accuracy, inference speed and network parameters. Specifically, our RAFNet with only 0.96M parameters obtains 75.3% mIoU at 107 FPS and 75.8% mIoU at 195 FPS on Cityscapes and CamVid test sets for full-resolution inputs, respectively. After quantization and deployment on a Xilinx ZCU104 device, our RAFNet obtains a favorable segmentation performance with only 1.4W power.
Author Dai, Huhe
Chen, Lei
Zheng, Yuan
Author_xml – sequence: 1
  givenname: Lei
  surname: Chen
  fullname: Chen, Lei
  email: 32056105@mail.imu.edu.cn
  organization: College of Electronic Information Engineering, Inner Mongolia University, Hohhot, China
– sequence: 2
  givenname: Huhe
  surname: Dai
  fullname: Dai, Huhe
  email: daihuhe@imu.edu.cn
  organization: College of Electronic Information Engineering, Inner Mongolia University, Hohhot, China
– sequence: 3
  givenname: Yuan
  orcidid: 0000-0002-7632-6846
  surname: Zheng
  fullname: Zheng, Yuan
  email: zhengyuan@imu.edu.cn
  organization: National and Local Joint Engineering Research Center of Intelligent Information Processing Technology for Mongolian, College of Computer Science, Inner Mongolia University, Hohhot, China
BookMark eNp9kMtKw0AUhgepYFt9AXERcJ06l0wycVeK1UJRaKPbMJmclKm51JkE0ad30nQhLlyds_i_c_kmaFQ3NSB0TfCMEBzfJYvtWzKjmLIZozEjYXiGxoRz4VOK-cj1mBNfUMIv0MTaPcYkEEE0RvlmvnyG9t7bwEEaWUELRn_LrARvrkxjrb8B25Rdq5vaW3a2Ly7_2Zh3r2iMw2TpJ7oCb1XJHXhbqGTdauWaXQV1K3vwEp0XsrRwdapT9Lp8SBZP_vrlcbWYr31F47D1Jag45LwIFCMyYyEGnGe5kIEUUgEXRDCasSIKMaaS8bgApYRiQY5zEUdRxqbodph7MM1HB7ZN901narcypTFlhFJnwKXEkDr-Z6BIlR7ubI3UZUpw2jtNj07T3ml6cupQ-gc9GF1J8_U_dDNAGgB-ASTinBP2A69nheg
CODEN ITCTEM
CitedBy_id crossref_primary_10_1109_TCAD_2024_3491015
crossref_primary_10_1109_TCSVT_2024_3483191
crossref_primary_10_1007_s00371_025_04130_1
crossref_primary_10_1109_TCSVT_2024_3457622
crossref_primary_10_1109_TCSVT_2024_3427720
crossref_primary_10_1109_TITS_2024_3519162
crossref_primary_10_1007_s00371_025_03853_5
crossref_primary_10_1007_s00530_025_01923_1
Cites_doi 10.1109/CVPR.2018.00388
10.1109/TPAMI.2016.2572683
10.1609/aaai.v35i3.16359
10.1007/978-3-030-58452-8_45
10.1016/j.isprsjprs.2021.06.006
10.1109/TCSVT.2021.3121680
10.1007/s11263-021-01515-2
10.1109/CVPR.2017.179
10.48550/arXiv.1802.02611
10.1109/ICCV.2019.00069
10.1109/CVPR.2017.660
10.1109/TPAMI.2016.2644615
10.1145/3373087.3375887
10.1109/TCSVT.2021.3096814
10.1109/ICCV.2019.00073
10.1109/CVPR.2019.01191
10.1109/TCSVT.2020.3037234
10.1109/CVPR.2018.00929
10.1109/ICCV.2019.00365
10.1109/CVPR46437.2021.00405
10.1007/s11063-022-10740-w
10.1109/ICMEW46912.2020.9106038
10.1007/978-3-319-24574-4_28
10.1109/CVPR46437.2021.00959
10.1007/978-3-030-01261-8_20
10.1109/TCSVT.2021.3132047
10.1109/CVPR46437.2021.01352
10.1109/TCSVT.2022.3206476
10.1109/CVPR42600.2020.00884
10.1109/TCSVT.2020.3015866
10.1109/CVPR42600.2020.01044
10.1016/j.patcog.2021.108250
10.1109/ICCV.2019.00140
10.1109/CVPR42600.2020.00426
10.1109/CVPR.2016.350
10.1109/CVPR.2018.00199
10.1109/TIP.2020.3042065
10.1109/CVPR.2019.00975
10.1109/ICCV.2019.00141
10.1109/CVPR.2019.01289
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2023.3293166
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 1227
ExternalDocumentID 10_1109_TCSVT_2023_3293166
10175551
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61962043
  funderid: 10.13039/501100001809
– fundername: Natural Science Foundation of Inner Mongolia
  grantid: 2019MS06012
  funderid: 10.13039/501100004763
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c296t-aec9655f4c31ab360e0dbd8a4a8ace581832b3f76002a359fecc8c34d0d8977b3
IEDL.DBID RIE
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001173373700041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1051-8215
IngestDate Sat Sep 06 09:56:44 EDT 2025
Sat Nov 29 01:44:24 EST 2025
Tue Nov 18 22:23:59 EST 2025
Wed Aug 27 02:12:08 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c296t-aec9655f4c31ab360e0dbd8a4a8ace581832b3f76002a359fecc8c34d0d8977b3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-7632-6846
PQID 2923122821
PQPubID 85433
PageCount 16
ParticipantIDs crossref_citationtrail_10_1109_TCSVT_2023_3293166
ieee_primary_10175551
proquest_journals_2923122821
crossref_primary_10_1109_TCSVT_2023_3293166
PublicationCentury 2000
PublicationDate 2024-02-01
PublicationDateYYYYMMDD 2024-02-01
PublicationDate_xml – month: 02
  year: 2024
  text: 2024-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref34
ref15
ref37
ref14
ref36
ref30
ref11
ref33
ref10
ref32
Si (ref42) 2019
ref2
ref1
ref17
ref39
ref16
ref38
ref18
Gao (ref19) 2021
ref24
ref46
ref23
ref45
ref26
ref25
ref20
ref41
ref22
Han (ref31)
ref44
ref21
Chen (ref35)
ref43
ref27
Hong (ref28) 2021
ref8
Simonyan (ref29)
ref7
ref9
ref4
ref3
ref6
ref5
ref40
References_xml – start-page: 1
  volume-title: Proc. Int. Conf. Learn. Represent. (ICLR)
  ident: ref35
  article-title: FasterSeg: Searching for faster real-time semantic segmentation
– ident: ref12
  doi: 10.1109/CVPR.2018.00388
– year: 2021
  ident: ref28
  article-title: Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes
  publication-title: arXiv:2101.06085
– ident: ref9
  doi: 10.1109/TPAMI.2016.2572683
– start-page: 1
  volume-title: Proc. 4th Int. Conf. Learn. Represent. (ICLR)
  ident: ref31
  article-title: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding
– ident: ref39
  doi: 10.1609/aaai.v35i3.16359
– ident: ref14
  doi: 10.1007/978-3-030-58452-8_45
– ident: ref24
  doi: 10.1016/j.isprsjprs.2021.06.006
– ident: ref15
  doi: 10.1109/TCSVT.2021.3121680
– ident: ref26
  doi: 10.1007/s11263-021-01515-2
– ident: ref5
  doi: 10.1109/CVPR.2017.179
– year: 2019
  ident: ref42
  article-title: Real-time semantic segmentation via multiply spatial fusion network
  publication-title: arXiv:1911.07217
– ident: ref3
  doi: 10.48550/arXiv.1802.02611
– year: 2021
  ident: ref19
  article-title: Rethink dilated convolution for real-time semantic segmentation
  publication-title: arXiv:2111.09957
– ident: ref45
  doi: 10.1109/ICCV.2019.00069
– ident: ref2
  doi: 10.1109/CVPR.2017.660
– ident: ref21
  doi: 10.1109/TPAMI.2016.2644615
– ident: ref33
  doi: 10.1145/3373087.3375887
– ident: ref10
  doi: 10.1109/TCSVT.2021.3096814
– ident: ref18
  doi: 10.1109/ICCV.2019.00073
– ident: ref41
  doi: 10.1109/CVPR.2019.01191
– ident: ref6
  doi: 10.1109/TCSVT.2020.3037234
– ident: ref44
  doi: 10.1109/CVPR.2018.00929
– ident: ref37
  doi: 10.1109/ICCV.2019.00365
– ident: ref27
  doi: 10.1109/CVPR46437.2021.00405
– ident: ref38
  doi: 10.1007/s11063-022-10740-w
– ident: ref17
  doi: 10.1109/ICMEW46912.2020.9106038
– ident: ref46
  doi: 10.1007/978-3-319-24574-4_28
– ident: ref13
  doi: 10.1109/CVPR46437.2021.00959
– ident: ref20
  doi: 10.1007/978-3-030-01261-8_20
– ident: ref8
  doi: 10.1109/TCSVT.2021.3132047
– start-page: 1
  volume-title: Proc. 3rd Int. Conf. Learn. Represent. (ICLR)
  ident: ref29
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref30
  doi: 10.1109/CVPR46437.2021.01352
– ident: ref4
  doi: 10.1109/TCSVT.2022.3206476
– ident: ref43
  doi: 10.1109/CVPR42600.2020.00884
– ident: ref22
  doi: 10.1109/TCSVT.2020.3015866
– ident: ref34
  doi: 10.1109/CVPR42600.2020.01044
– ident: ref7
  doi: 10.1016/j.patcog.2021.108250
– ident: ref25
  doi: 10.1109/ICCV.2019.00140
– ident: ref40
  doi: 10.1109/CVPR42600.2020.00426
– ident: ref1
  doi: 10.1109/CVPR.2016.350
– ident: ref11
  doi: 10.1109/CVPR.2018.00199
– ident: ref16
  doi: 10.1109/TIP.2020.3042065
– ident: ref23
  doi: 10.1109/CVPR.2019.00975
– ident: ref32
  doi: 10.1109/ICCV.2019.00141
– ident: ref36
  doi: 10.1109/CVPR.2019.01289
SSID ssj0014847
Score 2.484223
Snippet The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1212
SubjectTerms Coders
Computer networks
Decoding
Electronic devices
encoder-decoder structure
Feature extraction
Feature maps
hardware deployment
Image segmentation
lightweight network
Memory devices
Mobile handsets
Parameterization
Parameters
Real time
Real-time image segmentation
Real-time systems
Semantic segmentation
Semantics
Task analysis
Training
Title RAFNet: Reparameterizable Across-Resolution Fusion Network for Real-Time Image Semantic Segmentation
URI https://ieeexplore.ieee.org/document/10175551
https://www.proquest.com/docview/2923122821
Volume 34
WOSCitedRecordID wos001173373700041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library (IEL) (UW System Shared)
  customDbUrl:
  eissn: 1558-2205
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014847
  issn: 1051-8215
  databaseCode: RIE
  dateStart: 19910101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA86POjBz4nTKTl4k2xt04_U2xgWBSnipuxW8lUZbJ1snQf_el_Sbg5EwVsOSQn9JXm_l7zfewhdK5_mrpSUxFIE4KCAw8q5UCSOdO4BI-W0Spn_GKUpG43ip1qsbrUwWmsbfKY7pmnf8tVMLs1VWdcsnyAwguntKIoqsdb6ycBntpoY8AWXMDBkK4WME3eH_cHrsGMKhXcomDfXpkT8tkK2rMqPs9gamOTgn1M7RPs1k8S9CvojtKWLY7S3kV_wBKnnXpLq8hYDzeYmCsskZv40YincsxMj5va-Wns4WZqLM5xWceEYyCwM4xNiRCL4YQrnDh7oKQAxltB4m9aipaKJXpK7Yf-e1GUViPTisCRcyzgMgtyX1OWCho52lFCM-5xxqQNmNrmguX2x4zSIc0CZSeorRzFgi4KeokYxK_QZwsBfOAtVxCMY7mrFXQH0RICXo4E7CN1C7uo3Z7LOOW5KX0wy63s4cWahyQw0WQ1NC92sx7xXGTf-7N00YGz0rHBoofYKzqzelYvMM2zWAyfTPf9l2AXaha_7VVh2GzXK-VJfoh35UY4X8yu74L4A6RDTTw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA4yBfXBuzidmgffpFvTtGvq2xiWDWcRN2VvJbfKYBfZxQd_vSdpNwei4FseEhr6JTnfSc53DkI3yqcZkZI6kRQBOCjgsHIulBOFOvOAkXKap8zvhEnC-v3oqRCrWy2M1toGn-mqadq3fDWRC3NVVjPLJwiMYHoz8H2P5HKt1aOBz2w9MWAMxGFgypYaGTeq9Zrd117VlAqvUjBwxCZF_LZDtrDKj9PYmph4_5-TO0B7BZfEjRz8Q7Shx0dody3D4DFSz4040fM7DESbmzgsk5r508ilcMNOzDH39_nqw_HCXJ3hJI8Mx0BnYRgfOkYmgtsjOHlwV48AioGExtuokC2NT9BLfN9rtpyisIIjvag-d7iWUT0IMl9SwgWtu9pVQjHuc8alDpjZ5oJm9s2O0yDKAGcmqa9cxYAvCnqKSuPJWJ8hDAyGs7oKeQjDiVacCCAoAvwcDexB6DIiy9-cyiLruCl-MUyt9-FGqYUmNdCkBTRldLsa857n3Piz94kBY61njkMZVZZwpsW-nKWe4bMeuJnk_Jdh12i71XvspJ128nCBduBLfh6kXUGl-XShL9GW_JgPZtMru_i-AA931pY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=RAFNet%3A+Reparameterizable+Across-Resolution+Fusion+Network+for+Real-Time+Image+Semantic+Segmentation&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Chen%2C+Lei&rft.au=Dai%2C+Huhe&rft.au=Zheng%2C+Yuan&rft.date=2024-02-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=34&rft.issue=2&rft.spage=1212&rft.epage=1227&rft_id=info:doi/10.1109%2FTCSVT.2023.3293166&rft.externalDocID=10175551
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon