Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data

Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully util...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transaction on neural networks and learning systems Ročník 32; číslo 9; s. 4111 - 4122
Hlavní autoři: Gu, Bin, Zhai, Zhou, Deng, Cheng, Huang, Heng
Médium: Journal Article
Jazyk:angličtina
Vydáno: Piscataway IEEE 01.09.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2162-237X, 2162-2388, 2162-2388
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully utilizing a large amount of unlabeled data provides a second chance to improve the performance of active learning. Although there have been several active learning methods proposed by combining with semisupervised learning, fast active learning with fully exploiting unlabeled data and querying discriminative and representative samples is still an open question. To overcome this challenging issue, in this article, we propose a new efficient batch mode active learning algorithm. Specifically, we first provide an active learning risk bound by fully considering the unlabeled samples in characterizing the informativeness and representativeness. Based on the risk bound, we derive a new objective function for batch mode active learning. After that, we propose a wrapper algorithm to solve the objective function, which essentially trains a semisupervised classifier and selects discriminative and representative samples alternately. Especially, to avoid retraining the semisupervised classifier from scratch after each query, we design two unique procedures based on the path-following technique, which can remove multiple queried samples from the unlabeled data set and add the queried samples into the labeled data set efficiently. Extensive experimental results on a variety of benchmark data sets not only show that our algorithm has a better generalization performance than the state-of-the-art active learning approaches but also show its significant efficiency.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2020.3016928