Automatic assignment of biomedical categories: toward a generic approach

Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categoriz...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics Vol. 22; no. 6; pp. 658 - 664
Main Author: Ruch, Patrick
Format: Journal Article
Language:English
Published: England Oxford University Press 15.03.2006
Oxford Publishing Limited (England)
Subjects:
ISSN:1367-4803, 1460-2059, 1367-4811
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch
Bibliography:To whom correspondence should be addressed.
istex:D55620CA35BE6A1C4A5346A269529F8130D77ADC
ark:/67375/HXZ-BCXB2DZW-9
Associate Editor: Alfonso Valencia
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-1
ObjectType-Feature-3
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bti783