Improving Large-Scale k-Nearest Neighbor Text Categorization with Label Autoencoders
In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Nei...
Uložené v:
| Vydané v: | arXiv.org |
|---|---|
| Hlavní autori: | , , |
| Médium: | Paper |
| Jazyk: | English |
| Vydavateľské údaje: |
Ithaca
Cornell University Library, arXiv.org
03.02.2024
|
| Predmet: | |
| ISSN: | 2331-8422 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations. |
|---|---|
| Bibliografia: | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| ISSN: | 2331-8422 |
| DOI: | 10.48550/arxiv.2402.01963 |