Deep Road Scene Understanding

Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene understanding in an end-to-end manner. This core trainable understanding engine includes an encoder network, a decoder network with two streams, and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE signal processing letters Jg. 26; H. 4; S. 587 - 591
Hauptverfasser: Zhou, Wujie, Lv, Sijia, Jiang, Qiuping, Yu, Lu
Format: Journal Article
Sprache:Englisch
Veröffentlicht: IEEE 01.04.2019
Schlagworte:
ISSN:1070-9908, 1558-2361
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene understanding in an end-to-end manner. This core trainable understanding engine includes an encoder network, a decoder network with two streams, and a pixel-level fusion network with classification layer. The encoder network is composed of the front-end model of the classical convolution neural network, VGGNet. The decoder network with two streams includes multi-scale skip connection modules to reduce the down-scaling effect. Finally, a fusion network fuses the two-level information from the two streams of the decoder network for precise pixel-level classification. Additionally, the convolution layer is added to each skip connection module to increase the depth of the architecture. Our architecture achieves outstanding performance on the publicly available CamVid dataset and significantly outperforms previous architectures. This deep architecture is ideal for road scene understanding.
AbstractList Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene understanding in an end-to-end manner. This core trainable understanding engine includes an encoder network, a decoder network with two streams, and a pixel-level fusion network with classification layer. The encoder network is composed of the front-end model of the classical convolution neural network, VGGNet. The decoder network with two streams includes multi-scale skip connection modules to reduce the down-scaling effect. Finally, a fusion network fuses the two-level information from the two streams of the decoder network for precise pixel-level classification. Additionally, the convolution layer is added to each skip connection module to increase the depth of the architecture. Our architecture achieves outstanding performance on the publicly available CamVid dataset and significantly outperforms previous architectures. This deep architecture is ideal for road scene understanding.
Author Zhou, Wujie
Lv, Sijia
Yu, Lu
Jiang, Qiuping
Author_xml – sequence: 1
  givenname: Wujie
  orcidid: 0000-0002-3055-2493
  surname: Zhou
  fullname: Zhou, Wujie
  email: wujiezhou@163.com
  organization: School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
– sequence: 2
  givenname: Sijia
  surname: Lv
  fullname: Lv, Sijia
  email: lvsijia@zust.edu.cn
  organization: School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
– sequence: 3
  givenname: Qiuping
  surname: Jiang
  fullname: Jiang, Qiuping
  email: jiangqiuping@nbu.edu.cn
  organization: Faculty of Information Science and Engineering, Ningbo University, Ningbo, China
– sequence: 4
  givenname: Lu
  orcidid: 0000-0003-2903-9913
  surname: Yu
  fullname: Yu, Lu
  email: yul@zju.edu.cn
  organization: Institute of Information and Communication Engineering, Zhejiang University, Hangzhou, China
BookMark eNp9j81KAzEYRYNUsK3uBRHmBaZ-SZq_pdRfGFCsXYdM5otEaqYks_HtndLiwoWrexf3XDgzMkl9QkIuKSwoBXPTrF8XDKhZMG2kMvyETKkQumZc0snYQUFtDOgzMivlEwA01WJKru8Qd9Vb77pq7TFhtUkd5jK41MX0cU5Og9sWvDjmnGwe7t9XT3Xz8vi8um1qzyQfaoFM0OA9d4x2vFMscIUSJRegO2xbzz0Yhw5YCMIFwYOTQIVZUqZa4Zd8TuTh1-e-lIzB-ji4IfZpyC5uLQW7l7SjpN1L2qPkCMIfcJfjl8vf_yFXByQi4u9cSw5CKf4DcMxdtg
CODEN ISPLEM
CitedBy_id crossref_primary_10_3103_S1060992X23020108
crossref_primary_10_1109_LSP_2021_3089912
crossref_primary_10_3390_s22072654
crossref_primary_10_1016_j_eswa_2021_115090
crossref_primary_10_1109_LSP_2021_3134194
crossref_primary_10_1002_cpe_6767
crossref_primary_10_1016_j_aei_2023_101912
crossref_primary_10_1109_ACCESS_2020_3001679
crossref_primary_10_1109_JSEN_2021_3101497
crossref_primary_10_1109_MITS_2022_3180892
crossref_primary_10_1109_JSEN_2020_3037340
crossref_primary_10_1109_LSP_2020_3048849
Cites_doi 10.1109/CVPR.2015.7299057
10.1109/VCIP.2017.8305148
10.1109/TIP.2018.2794207
10.1016/j.ins.2017.02.049
10.1109/CVPR.2017.243
10.1109/ICCV.2015.178
10.1109/TITS.2017.2688352
10.1109/CVPR.2015.7298965
10.1109/LSP.2018.2865833
10.1109/LSP.2018.2809685
10.1016/j.patrec.2008.04.005
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/LSP.2019.2896793
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2361
EndPage 591
ExternalDocumentID 10_1109_LSP_2019_2896793
8630577
Genre orig-research
GrantInformation_xml – fundername: Zhejiang Provincial Natural Science Foundation of China
  grantid: LY18F020012
– fundername: China Postdoctoral Science Foundation
  grantid: 2015M581932
  funderid: 10.13039/501100002858
– fundername: National Natural Science Foundation of China
  grantid: 61502429; 61431015; 61501270
  funderid: 10.13039/501100001809
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
85S
97E
AAJGR
AARMG
AASAJ
AAWTH
AAYJJ
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYXX
CITATION
ID FETCH-LOGICAL-c263t-5e251fcc3a21d3d72f37e6e63508debbc3c09aea02ff5af53fa601594127b5c43
IEDL.DBID RIE
ISICitedReferencesCount 20
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000460727300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1070-9908
IngestDate Tue Nov 18 21:59:18 EST 2025
Sat Nov 29 01:48:52 EST 2025
Wed Aug 27 02:50:33 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c263t-5e251fcc3a21d3d72f37e6e63508debbc3c09aea02ff5af53fa601594127b5c43
ORCID 0000-0002-3055-2493
0000-0003-2903-9913
PageCount 5
ParticipantIDs crossref_citationtrail_10_1109_LSP_2019_2896793
ieee_primary_8630577
crossref_primary_10_1109_LSP_2019_2896793
PublicationCentury 2000
PublicationDate 2019-April
2019-4-00
PublicationDateYYYYMMDD 2019-04-01
PublicationDate_xml – month: 04
  year: 2019
  text: 2019-April
PublicationDecade 2010
PublicationTitle IEEE signal processing letters
PublicationTitleAbbrev LSP
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
References ref13
iglovikov (ref19) 2018
romera (ref5) 2016
ref2
ref1
ref17
paszke (ref15) 2016
(ref21) 2014
kendall (ref12) 2015
badrinarayanan (ref10) 2015
ronneberger (ref14) 0
ref24
ref23
simonyan (ref6) 2014
chen (ref20) 0
mehta (ref18) 0
ref8
lin (ref16) 2016
ref7
ref9
ref3
badrinarayanan (ref11) 2015
he (ref25) 2015
melih (ref4) 2017; 18
(ref22) 2015
References_xml – ident: ref24
  doi: 10.1109/CVPR.2015.7299057
– ident: ref17
  doi: 10.1109/VCIP.2017.8305148
– year: 2018
  ident: ref19
  article-title: TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation
– ident: ref7
  doi: 10.1109/TIP.2018.2794207
– year: 2015
  ident: ref12
  article-title: Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding
  publication-title: Comput Sci
– year: 0
  ident: ref20
  article-title: Semantic image segmentation with deep convolutional nets and fully connected CRFs
  publication-title: Proc Int Conf Learn Representations
– year: 2015
  ident: ref11
  article-title: Segnet: A deep convolutional encoder-decoder architecture for image segmentation
– year: 2015
  ident: ref25
  article-title: Deep residual learning for image recognition
– year: 2015
  ident: ref22
  article-title: On using very large target vocabulary for neural machine translation
  publication-title: Comput Sci
– year: 2016
  ident: ref16
  article-title: Refinenet: Multipath refinement networks with identity mappings for highresolution semantic segmentation
  publication-title: arXiv 1611 06612
– ident: ref2
  doi: 10.1016/j.ins.2017.02.049
– year: 2016
  ident: ref15
  article-title: ENet: A deep neural network architecture for real-time semantic segmentation
– ident: ref8
  doi: 10.1109/CVPR.2017.243
– year: 2014
  ident: ref21
  article-title: Adam: A method for stochastic optimization
  publication-title: arXiv 1412 6980
– ident: ref13
  doi: 10.1109/ICCV.2015.178
– volume: 18
  start-page: 3398
  year: 2017
  ident: ref4
  article-title: Road scene content analysis for driver assistance and autonomous driving
  publication-title: IEEE Trans Intell Transp Syst
  doi: 10.1109/TITS.2017.2688352
– start-page: 234
  year: 0
  ident: ref14
  article-title: U-Net: Convolutional networks for biomedical image segmentation
  publication-title: Proc Int Conf Med Image Comput Comput -Assisted Intervention
– ident: ref9
  doi: 10.1109/CVPR.2015.7298965
– ident: ref3
  doi: 10.1109/LSP.2018.2865833
– year: 2014
  ident: ref6
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref1
  doi: 10.1109/LSP.2018.2809685
– ident: ref23
  doi: 10.1016/j.patrec.2008.04.005
– year: 0
  ident: ref18
  article-title: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation
  publication-title: Proc Eur Conf Comput Vision
– year: 2015
  ident: ref10
  article-title: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling
– year: 2016
  ident: ref5
  article-title: Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?
SSID ssj0008185
Score 2.3632057
Snippet Road scene understanding is a difficult task in autonomous driving. In this letter, we propose a novel deep encoder-decoder architecture for road scene...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 587
SubjectTerms Computer architecture
Convolution
Decoding
encoder–decoder architecture
Feature extraction
Fuses
fusion layer
Road scene understanding
Roads
Semantics
skip connection
Title Deep Road Scene Understanding
URI https://ieeexplore.ieee.org/document/8630577
Volume 26
WOSCitedRecordID wos000460727300004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2361
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0008185
  issn: 1070-9908
  databaseCode: RIE
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5q8aAHX1WsWtmDF8G06WZ3kxxFLR6kFGuht2XyAkHaUlt_v8nudqkggrcQZiB8CfPIvABuFCrKDNdEaudIkqSSoHBIUmoNk6hMJsphE3w4FNOpHDXgrq6FsdYWyWe2G5ZFLN_M9Tp8lfVE5l8n5zuwwzkva7VqqRsUT5lfSImXsGITkqSy9zIehRwu2fXORcYl-6GCtmaqFCplcPi_wxzBQWU6RvflXR9Dw85OYH-roWALOo_WLqLXOZporL0UiybbtSunMBk8vT08k2oAAtFxxlYktd76cFozjPvGIxo7xm1mvY1AhbFKaaapRIs0di5FlzKH3r9KZdKPuUp1ws6gOZvP7DlEsZYUDWptZJYYqRCRy9go4f0JiYloQ2-DSa6r7uBhSMVHXngJVOYexTygmFcotuG25liUnTH-oG0FAGu6CruL37cvYS8wl_kxV9BcLde2A7v6a_X-ubwurv0bZBaqJQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JSwMxFH7UKqgHd7FqdQ5eBKdNk8lMchS1VKyl2BZ6G7KCIG2prb_fZGY6VBDBWwhJCF8eb8nbAG6kkIjoRIVcWRtGEeWhYFaEFBlNuJA6ZnmziaTXY-Mx71fgrsyFMcZkwWem4YeZL19P1dJ_lTVZ7KgzSTZgk0YRbuXZWiXf9aInjzBEoeOxbOWURLzZHfR9FBdvOPMiTjj5IYTWuqpkQqW9_7_rHMBeoTwG9_lrH0LFTI5gd62k4DHUH42ZBW9ToYOBcnwsGK1nr5zAqP00fOiERQuEUOGYLEJqnP5hlSICt7TDFFuSmNg4LQExbaRURCEujEDYWiosJVY4C4vyqIUTSVVETqE6mU7MGQRYcSS0UErzONJcCiESjrVkzqLgImI1aK4wSVVRH9y3qfhIMzsB8dShmHoU0wLFGtyWO2Z5bYw_1h57AMt1BXbnv09fw3Zn-NpNu8-9lwvY8Qfl0TKXUF3Ml6YOW-pr8f45v8pI4Btwsq1s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+Road+Scene+Understanding&rft.jtitle=IEEE+signal+processing+letters&rft.au=Zhou%2C+Wujie&rft.au=Lv%2C+Sijia&rft.au=Jiang%2C+Qiuping&rft.au=Yu%2C+Lu&rft.date=2019-04-01&rft.issn=1070-9908&rft.eissn=1558-2361&rft.volume=26&rft.issue=4&rft.spage=587&rft.epage=591&rft_id=info:doi/10.1109%2FLSP.2019.2896793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LSP_2019_2896793
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon