Automatic assignment of biomedical categories: toward a generic approach

Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categoriz...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics Ročník 22; číslo 6; s. 658 - 664
Hlavní autor: Ruch, Patrick
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 15.03.2006
Oxford Publishing Limited (England)
Témata:
ISSN:1367-4803, 1460-2059, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch
Bibliografie:To whom correspondence should be addressed.
istex:D55620CA35BE6A1C4A5346A269529F8130D77ADC
ark:/67375/HXZ-BCXB2DZW-9
Associate Editor: Alfonso Valencia
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-1
ObjectType-Feature-3
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti783