The use of unlabelled data for supervised learning

Uložené v:
Podrobná bibliografia
Názov: The use of unlabelled data for supervised learning
Autori: Dara, Rozita
Prispievatelia: Stacey, Deborah A., Kremer, Stefan C., The Atrium at the University of Guelph
Informácie o vydavateľovi: University of Guelph, 2001.
Rok vydania: 2001
Predmety: supervised learning algorithm, classification, Guelph Cluster Class, unsupervised network, train, Self-Organizing Map, unlabelled data, labelled data
Popis: When provided with enough labelled training examples, a supervised learning algorithm can learn reasonably accurately. However, creating sufficient labelled data to train accurate classifiers is time consuming and expensive. On the other hand, unlabelled data is usually easy to obtain. This research introduces a novel approach, Guelph Cluster Class (GCC), which improves the task of classification with the use of unlabelled data. The novelty of this approach lies in the use of an unsupervised network, 'Self-Organizing Map', to select natural clusters in labelled and unlabelled data. Sub-classes (made by labelled data) are used to assign labels to unlabelled patterns to produce ' self-labelled' data. The performance of several variants of the GCC system have been obtained by running a 'Back-Propagation' network on labelled and self-labelled data. Results of experiments on several benchmark datasets demonstrate an increasing power for the classification procedure even when the number of labelled data is very small.
Druh dokumentu: Thesis
Popis súboru: application/pdf
Jazyk: English
Prístupová URL adresa: https://hdl.handle.net/10214/20545
Prístupové číslo: edsair.od.......453..32fc6bd83c1af70748fbf7b777aa785b
Databáza: OpenAIRE
Popis
Abstrakt:When provided with enough labelled training examples, a supervised learning algorithm can learn reasonably accurately. However, creating sufficient labelled data to train accurate classifiers is time consuming and expensive. On the other hand, unlabelled data is usually easy to obtain. This research introduces a novel approach, Guelph Cluster Class (GCC), which improves the task of classification with the use of unlabelled data. The novelty of this approach lies in the use of an unsupervised network, 'Self-Organizing Map', to select natural clusters in labelled and unlabelled data. Sub-classes (made by labelled data) are used to assign labels to unlabelled patterns to produce ' self-labelled' data. The performance of several variants of the GCC system have been obtained by running a 'Back-Propagation' network on labelled and self-labelled data. Results of experiments on several benchmark datasets demonstrate an increasing power for the classification procedure even when the number of labelled data is very small.