Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN

Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder a...

Full description

Saved in:
Bibliographic Details
Published in:2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 859 - 868
Main Authors: Acuna, David, Ling, Huan, Kar, Amlan, Fidler, Sanja
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2018
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
AbstractList Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
Author Ling, Huan
Kar, Amlan
Acuna, David
Fidler, Sanja
Author_xml – sequence: 1
  givenname: David
  surname: Acuna
  fullname: Acuna, David
  email: davidj@cs.toronto.edu
  organization: NVIDIA
– sequence: 2
  givenname: Huan
  surname: Ling
  fullname: Ling, Huan
  email: linghuan@cs.toronto.edu
  organization: Vector Institute
– sequence: 3
  givenname: Amlan
  surname: Kar
  fullname: Kar, Amlan
  email: amlan@cs.toronto.edu
  organization: Vector Institute
– sequence: 4
  givenname: Sanja
  surname: Fidler
  fullname: Fidler, Sanja
  email: fidler@cs.toronto.edu
  organization: Vector Institute
BookMark eNotj81Kw0AURkdRsNasXbjJC6TeyfwvS2y1UmqpxW2ZZO7UkXYiyaD07Q3U1eGDwwfnllzFNiIh9xQmlIJ5rD7Wm0kJVE8AwMgLkhmlqWBaSl6CuSQjCpIV0lBzQ7K-_xq0UmqmuRiR15n3oQkYU76ICTvbpPCD-TTGNtkU2pi3Pn_H_XEwzvvJJttj6vPfkD7zdXs47dtYbFarO3Lt7aHH7J9jsp3PttVLsXx7XlTTZREMpIKB8dopZiV1JRcKuXWN8mhYo4Rnvqyhpt4BM0qCqLmray2clAZpiSA5G5OH821AxN13F462O-20GJoNZ39mkU8o
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00096
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 868
ExternalDocumentID 8578194
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
IEDL.DBID RIE
IngestDate Wed Aug 27 02:43:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
PageCount 10
ParticipantIDs ieee_primary_8578194
PublicationCentury 2000
PublicationDate 2018-Jun
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-Jun
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.5885525
Snippet Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations...
SourceID ieee
SourceType Publisher
StartPage 859
SubjectTerms Computer architecture
Decoding
Labeling
Neural networks
Predictive models
Training
Title Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN
URI https://ieeexplore.ieee.org/document/8578194
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05a8MwFH4koUOnHknpjYaOdSNfOsaSJpRSjElDyRZkSwqBVi6JU-i_rySbZOnSzdJgxLOs973j-wRwp1kkiBQk0JENVxMbAASCqyTgVOBSM2ExvCcKv9IsY_M5zztwv-PCKKV885l6cI--li-rcutSZUNmt5cNurvQpZQ2XK1dPiUiLGZthcyNYxvZEM5aNZ8Q8-HoPZ-6Xi7XPIm9SP_-OhXvTSZH_1vHMQz2tDyU7xzOCXSUOYWjFkei9i_d9OFl7HUh7DuQT_gJf6ahR2OqpvCOKo3e1PKz5R0Z9CRq683qDXJpWZRXHz_LygTTLBvAbDKejZ6D9s6EYMVxHcSYayZpLEgooySlKhGypFrxuKSpjnVU4CLUEjsUgtMikUXBUkkIV2GkHDo5g56pjDoHFFMnLkWFNaMN0YQFesKeBbpgpMB2Bl9A31lm8dWoYixao1z-PX0Fh870TZPVNfTq9VbdwEH5Xa8261v_KX8B-pWdyA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGH6DaKInVDB-24NHJ91XP44GIai4LEgMN9JtLSHRzcAw8d_bdgtcvHhre1iWt1vf5_14ngLcKuYJkgniKE-Hq4EOABzBZeBwKnCqmNAY3hKFRzSK2HTK4wbcbbgwUkrbfCbvzdDW8rMiXZtUWZfpz0sH3TuwGwaB51ZsrU1GxSPMZ3WNzMx9HdsQzmo9Hxfzbu89HptuLtM-ia1M__ZCFetPBq3_vckhdLbEPBRvXM4RNGR-DK0aSaL6P1214blvlSH0M5BN-Ql7qqGHPC-q0jsqFHqT88-aeZSjR1Fqf1aukEnMorj4-JkXuTOOog5MBv1Jb-jUtyY4C45Lx8dcsYz6griZF4RUBiJLqZLcT2mofOUlOHFVhg0OwWESZEnCwowQLl1PGnxyAs28yOUpIJ8aeSkqtBl1kCY01BP6NFAJIwnWK_gM2sYys69KF2NWG-X87-Ub2B9OXkez0VP0cgEHZhuqlqtLaJbLtbyCvfS7XKyW13ZbfwFhYaEP
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Efficient+Interactive+Annotation+of+Segmentation+Datasets+with+Polygon-RNN&rft.au=Acuna%2C+David&rft.au=Ling%2C+Huan&rft.au=Kar%2C+Amlan&rft.au=Fidler%2C+Sanja&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=859&rft.epage=868&rft_id=info:doi/10.1109%2FCVPR.2018.00096&rft.externalDocID=8578194