Automatic assignment of biomedical categories: toward a generic approach
Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categoriz...
Saved in:
| Published in: | Bioinformatics Vol. 22; no. 6; pp. 658 - 664 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
England
Oxford University Press
15.03.2006
Oxford Publishing Limited (England) |
| Subjects: | |
| ISSN: | 1367-4803, 1460-2059, 1367-4811 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch |
|---|---|
| Bibliography: | To whom correspondence should be addressed. istex:D55620CA35BE6A1C4A5346A269529F8130D77ADC ark:/67375/HXZ-BCXB2DZW-9 Associate Editor: Alfonso Valencia ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-1 ObjectType-Feature-3 |
| ISSN: | 1367-4803 1460-2059 1367-4811 |
| DOI: | 10.1093/bioinformatics/bti783 |