Automatic assignment of biomedical categories: toward a generic approach
Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categoriz...
Uloženo v:
| Vydáno v: | Bioinformatics Ročník 22; číslo 6; s. 658 - 664 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
England
Oxford University Press
15.03.2006
Oxford Publishing Limited (England) |
| Témata: | |
| ISSN: | 1367-4803, 1460-2059, 1367-4811 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch |
|---|---|
| Bibliografie: | To whom correspondence should be addressed. istex:D55620CA35BE6A1C4A5346A269529F8130D77ADC ark:/67375/HXZ-BCXB2DZW-9 Associate Editor: Alfonso Valencia ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-1 ObjectType-Feature-3 |
| ISSN: | 1367-4803 1460-2059 1367-4811 |
| DOI: | 10.1093/bioinformatics/bti783 |