Mixed-Supervised Scene Text Detection With Expectation-Maximization Algorithm

Scene text detection is an important and challenging task in computer vision. For detecting arbitrarily-shaped texts, most existing methods require heavy data labeling efforts to produce polygon-level text region labels for supervised training. In order to reduce the cost in data labeling, we study...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on image processing Ročník 31; s. 5513 - 5528
Hlavní autori:	Zhao, Mengbiao, Feng, Wei, Yin, Fei, Zhang, Xu-Yao, Liu, Cheng-Lin
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	New York IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithms Annotations Benchmark testing Benchmarks Computer vision Costs Data models Datasets Detectors expectation-maximization algorithm Labeling Labels Maximization Mixed-supervised learning Optimization Polygons scene text detection Supervised learning Training weak supervision forms
ISSN:	1057-7149, 1941-0042, 1941-0042
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Scene text detection is an important and challenging task in computer vision. For detecting arbitrarily-shaped texts, most existing methods require heavy data labeling efforts to produce polygon-level text region labels for supervised training. In order to reduce the cost in data labeling, we study mixed-supervised arbitrarily-shaped text detection by combining various weak supervision forms (e.g., image-level tags, coarse, loose and tight bounding boxes), which are far easier to annotate. Whereas the existing weakly-supervised learning methods (such as multiple instance learning) do not promote full object coverage, to approximate the performance of fully-supervised detection, we propose an Expectation-Maximization (EM) based mixed-supervised learning framework to train scene text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data. The polygon-level labels are treated as latent variables and recovered from the weak labels by the EM algorithm. A new contour-based scene text detector is also proposed to facilitate the use of weak labels in our mixed-supervised learning framework. Extensive experiments on six scene text benchmarks show that (1) using only 10% strongly annotated data and 90% weakly annotated data, our method yields comparable performance to that of fully supervised methods, (2) with 100% strongly annotated data, our method achieves state-of-the-art performance on five scene text benchmarks (CTW1500, Total-Text, ICDAR-ArT, MSRA-TD500, and C-SVT), and competitive results on the ICDAR2015 Dataset. We will make our weakly annotated datasets publicly available.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2022.3197987