Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN

Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder a...

Full description

Saved in:

Bibliographic Details
Published in:	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 859 - 868
Main Authors:	Acuna, David, Ling, Huan, Kar, Amlan, Fidler, Sanja
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.06.2018
Subjects:	Computer architecture Decoding Labeling Neural networks Predictive models Training
ISSN:	1063-6919
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
AbstractList	Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset [8] shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.
Author	Ling, Huan Kar, Amlan Acuna, David Fidler, Sanja
Author_xml	– sequence: 1 givenname: David surname: Acuna fullname: Acuna, David email: davidj@cs.toronto.edu organization: NVIDIA – sequence: 2 givenname: Huan surname: Ling fullname: Ling, Huan email: linghuan@cs.toronto.edu organization: Vector Institute – sequence: 3 givenname: Amlan surname: Kar fullname: Kar, Amlan email: amlan@cs.toronto.edu organization: Vector Institute – sequence: 4 givenname: Sanja surname: Fidler fullname: Fidler, Sanja email: fidler@cs.toronto.edu organization: Vector Institute
BookMark	eNotj81Kw0AURkdRsNasXbjJC6TeyfwvS2y1UmqpxW2ZZO7UkXYiyaD07Q3U1eGDwwfnllzFNiIh9xQmlIJ5rD7Wm0kJVE8AwMgLkhmlqWBaSl6CuSQjCpIV0lBzQ7K-_xq0UmqmuRiR15n3oQkYU76ICTvbpPCD-TTGNtkU2pi3Pn_H_XEwzvvJJttj6vPfkD7zdXs47dtYbFarO3Lt7aHH7J9jsp3PttVLsXx7XlTTZREMpIKB8dopZiV1JRcKuXWN8mhYo4Rnvqyhpt4BM0qCqLmray2clAZpiSA5G5OH821AxN13F462O-20GJoNZ39mkU8o
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2018.00096
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781538664209 1538664208
EISSN	1063-6919
EndPage	868
ExternalDocumentID	8578194
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:43:29 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-309f8d73a61d2457e4adc7fe93c75f3f2b0b1fd0397605b4dbb85d669e12e0643
PageCount	10
ParticipantIDs	ieee_primary_8578194
PublicationCentury	2000
PublicationDate	2018-Jun
PublicationDateYYYYMMDD	2018-06-01
PublicationDate_xml	– month: 06 year: 2018 text: 2018-Jun
PublicationDecade	2010
PublicationTitle	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev	CVPR
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002683845 ssj0003211698
Score	2.5885525
Snippet	Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN [4] to produce polygonal annotations...
SourceID	ieee
SourceType	Publisher
StartPage	859
SubjectTerms	Computer architecture Decoding Labeling Neural networks Predictive models Training
Title	Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN
URI	https://ieeexplore.ieee.org/document/8578194
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV05a8MwFH4koUOnHknpjYaOdSNfOsaSJpRSjElDyRZkSwqBVi6JU-i_rySbZOnSzdJgxLOs973j-wRwp1kkiBQk0JENVxMbAASCqyTgVOBSM2ExvCcKv9IsY_M5zztwv-PCKKV885l6cI--li-rcutSZUNmt5cNurvQpZQ2XK1dPiUiLGZthcyNYxvZEM5aNZ8Q8-HoPZ-6Xi7XPIm9SP_-OhXvTSZH_1vHMQz2tDyU7xzOCXSUOYWjFkei9i_d9OFl7HUh7DuQT_gJf6ahR2OqpvCOKo3e1PKz5R0Z9CRq683qDXJpWZRXHz_LygTTLBvAbDKejZ6D9s6EYMVxHcSYayZpLEgooySlKhGypFrxuKSpjnVU4CLUEjsUgtMikUXBUkkIV2GkHDo5g56pjDoHFFMnLkWFNaMN0YQFesKeBbpgpMB2Bl9A31lm8dWoYixao1z-PX0Fh870TZPVNfTq9VbdwEH5Xa8261v_KX8B-pWdyA
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGH6DaKInVDB-24NHJ91XP44GIai4LEgMN9JtLSHRzcAw8d_bdgtcvHhre1iWt1vf5_14ngLcKuYJkgniKE-Hq4EOABzBZeBwKnCqmNAY3hKFRzSK2HTK4wbcbbgwUkrbfCbvzdDW8rMiXZtUWZfpz0sH3TuwGwaB51ZsrU1GxSPMZ3WNzMx9HdsQzmo9Hxfzbu89HptuLtM-ia1M__ZCFetPBq3_vckhdLbEPBRvXM4RNGR-DK0aSaL6P1214blvlSH0M5BN-Ql7qqGHPC-q0jsqFHqT88-aeZSjR1Fqf1aukEnMorj4-JkXuTOOog5MBv1Jb-jUtyY4C45Lx8dcsYz6griZF4RUBiJLqZLcT2mofOUlOHFVhg0OwWESZEnCwowQLl1PGnxyAs28yOUpIJ8aeSkqtBl1kCY01BP6NFAJIwnWK_gM2sYys69KF2NWG-X87-Ub2B9OXkez0VP0cgEHZhuqlqtLaJbLtbyCvfS7XKyW13ZbfwFhYaEP
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Efficient+Interactive+Annotation+of+Segmentation+Datasets+with+Polygon-RNN&rft.au=Acuna%2C+David&rft.au=Ling%2C+Huan&rft.au=Kar%2C+Amlan&rft.au=Fidler%2C+Sanja&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=859&rft.epage=868&rft_id=info:doi/10.1109%2FCVPR.2018.00096&rft.externalDocID=8578194