Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN

Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder a...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition s. 859 - 868
Hlavní autori: Acuna, David, Ling, Huan, Kar, Amlan, Fidler, Sanja
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.06.2018
Predmet:
ISSN:1063-6919
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
AbstractList Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
Author Ling, Huan
Kar, Amlan
Acuna, David
Fidler, Sanja
Author_xml – sequence: 1
  givenname: David
  surname: Acuna
  fullname: Acuna, David
  email: davidj@cs.toronto.edu
  organization: NVIDIA
– sequence: 2
  givenname: Huan
  surname: Ling
  fullname: Ling, Huan
  email: linghuan@cs.toronto.edu
  organization: Vector Institute
– sequence: 3
  givenname: Amlan
  surname: Kar
  fullname: Kar, Amlan
  email: amlan@cs.toronto.edu
  organization: Vector Institute
– sequence: 4
  givenname: Sanja
  surname: Fidler
  fullname: Fidler, Sanja
  email: fidler@cs.toronto.edu
  organization: Vector Institute
BookMark eNotj81Kw0AURkdRsNasXbjJC6TeyfwvS2y1UmqpxW2ZZO7UkXYiyaD07Q3U1eGDwwfnllzFNiIh9xQmlIJ5rD7Wm0kJVE8AwMgLkhmlqWBaSl6CuSQjCpIV0lBzQ7K-_xq0UmqmuRiR15n3oQkYU76ICTvbpPCD-TTGNtkU2pi3Pn_H_XEwzvvJJttj6vPfkD7zdXs47dtYbFarO3Lt7aHH7J9jsp3PttVLsXx7XlTTZREMpIKB8dopZiV1JRcKuXWN8mhYo4Rnvqyhpt4BM0qCqLmray2clAZpiSA5G5OH821AxN13F462O-20GJoNZ39mkU8o
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00096
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 868
ExternalDocumentID 8578194
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
IEDL.DBID RIE
IngestDate Wed Aug 27 02:43:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
PageCount 10
ParticipantIDs ieee_primary_8578194
PublicationCentury 2000
PublicationDate 2018-Jun
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-Jun
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.5885525
Snippet Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations...
SourceID ieee
SourceType Publisher
StartPage 859
SubjectTerms Computer architecture
Decoding
Labeling
Neural networks
Predictive models
Training
Title Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN
URI https://ieeexplore.ieee.org/document/8578194
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eDJRyu-ycGja7OPbJKj1BYRWZZapLeSZynorrRbwX9vkg3txYu3JIcQJo-Zb2a-CQB3BGPNhUUnFnipKFMyjgRSKuI4zwzniabIE4VfSVHQ-ZyVHXC_48JorX3ymX5wTR_LV7XcOlfZkNrjZUF3F3QJIS1Xa-dPSXKa0hAhc_3UIpuc0VDNJ0ZsOHovpy6XyyVPIl-kf_-ditcmk6P_reMYDPa0PFjuFM4J6OjqFBwFOxKGW7rpg5exrwth54De4cf9mwYfq6puA--wNvBNLz8D76iCT7yx2qzZQOeWhWX98bOsq2haFAMwm4xno-co_JkQrRhqohQxQxVJeR6rJMNEZ1xJYjRLJcEmNYlAIjYKOSsEYZEpIShWec50nGhnnZyBXlVX-hxAF55hTMpEZsyCTClsK85kaiQyHBt8AfpOMouvtirGIgjl8u_hK3DoRN8mWV2DXrPe6htwIL-b1WZ967fyF5oPn7Y
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKQYKpQIt444GRUCexE3tEpVWBEkWlQt0qx4-qEiSoTZH499hO1C4sbLYHy_Lr7ru77w6A25gQxTODTgzwkh6WwvcyJKXHSYQ154GiyBGFR3GS0OmUpQ1wt-HCKKVc8Jm6t03ny5eFWFtTWZea62VA9w7YJRgHfsXW2lhUgoiGtPaR2X5osE3EaJ3Px0es23tPxzaay4ZPIpemf1tQxcmTQet_KzkEnS0xD6YbkXMEGio_Bq1ak4T1O121wXPfZYYwc0Bn8uPuV4MPeV5UrndYaPim5p818yiHj7w08qxcQWuYhWnx8TMvcm-cJB0wGfQnvaFXV03wFgyVXoiYpjIOeeTLAJNYYS5FrBULRUx0qIMMZb6WyOohiGRYZhklMoqY8gNl9ZMT0MyLXJ0CaB00jAkRCMwMzBSZaflYhFogzYkmZ6Btd2b2VeXFmNWbcv738A3YH05eR7PRU_JyAQ7sMVQhV5egWS7X6grsie9ysVpeu2P9BRkPov0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Efficient+Interactive+Annotation+of+Segmentation+Datasets+with+Polygon-RNN&rft.au=Acuna%2C+David&rft.au=Ling%2C+Huan&rft.au=Kar%2C+Amlan&rft.au=Fidler%2C+Sanja&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=859&rft.epage=868&rft_id=info:doi/10.1109%2FCVPR.2018.00096&rft.externalDocID=8578194