Efficient Active Learning by Querying Discriminative and Representative Samples and Fully Exploiting Unlabeled Data
Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully util...
Saved in:
| Published in: | IEEE transaction on neural networks and learning systems Vol. 32; no. 9; pp. 4111 - 4122 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
01.09.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2162-237X, 2162-2388, 2162-2388 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Active learning is an important learning paradigm in machine learning and data mining, which aims to train effective classifiers with as few labeled samples as possible. Querying discriminative (informative) and representative samples are the state-of-the-art approach for active learning. Fully utilizing a large amount of unlabeled data provides a second chance to improve the performance of active learning. Although there have been several active learning methods proposed by combining with semisupervised learning, fast active learning with fully exploiting unlabeled data and querying discriminative and representative samples is still an open question. To overcome this challenging issue, in this article, we propose a new efficient batch mode active learning algorithm. Specifically, we first provide an active learning risk bound by fully considering the unlabeled samples in characterizing the informativeness and representativeness. Based on the risk bound, we derive a new objective function for batch mode active learning. After that, we propose a wrapper algorithm to solve the objective function, which essentially trains a semisupervised classifier and selects discriminative and representative samples alternately. Especially, to avoid retraining the semisupervised classifier from scratch after each query, we design two unique procedures based on the path-following technique, which can remove multiple queried samples from the unlabeled data set and add the queried samples into the labeled data set efficiently. Extensive experimental results on a variety of benchmark data sets not only show that our algorithm has a better generalization performance than the state-of-the-art active learning approaches but also show its significant efficiency. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2162-237X 2162-2388 2162-2388 |
| DOI: | 10.1109/TNNLS.2020.3016928 |