Automatic assignment of biomedical categories: toward a generic approach

Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categoriz...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics Vol. 22; no. 6; pp. 658 - 664
Main Author: Ruch, Patrick
Format: Journal Article
Language:English
Published: England Oxford University Press 15.03.2006
Oxford Publishing Limited (England)
Subjects:
ISSN:1367-4803, 1460-2059, 1367-4811
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch
AbstractList Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:Patrick.Ruch@sim.hcuge.ch
We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.MOTIVATIONWe report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.METHODSIn order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.RESULTS AND CONCLUSIONResults show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.
We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.
Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact:  Patrick.Ruch@sim.hcuge.ch
MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. METHODS: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. RESULTS: and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. CONTACT: Patrick.Ruchim.hcuge.ch
Author Ruch, Patrick
Author_xml – sequence: 1
  givenname: Patrick
  surname: Ruch
  fullname: Ruch, Patrick
BackLink https://www.ncbi.nlm.nih.gov/pubmed/16287934$$D View this record in MEDLINE/PubMed
BookMark eNqFkUtPHSEAhUmjqY_2J7SZdOFulDdDu9Lr45qYdNOmxg1huHCLnYErMFH_vejVNnXjChbfOQfO2QEbIQYLwCcE9xGU5KD30QcX06iLN_mgL1505B3YRpTDFkMmN-qdcNHSDpItsJPzNYQMUUrfgy3EcSckodtgfjiV-OTR6Jz9Mow2lCa6pvqPduGNHhqji13G5G3-2pR4q9Oi0c3SBpseVatVitr8_gA2nR6y_fh87oKfpyc_ZvP24vvZ-ezwojUM8dJSS5DhsKv51DlqYMd71nMsILHSCOkkEVpjjJlemAVizmGtmbHSUYykhWQX7K19a-zNZHNRo8_GDoMONk5ZcSEIhpS8CSJJCRJYVPDLK_A6TinUT1SmPpMR3lXo8zM09bUXtUp-1OlevTRZAbYGTIo5J-v-IVA9Lqb-X0ytF6u6b690xpdKxFCS9sOb6nat9rnYu7-ROv2pRRDB1PzySh3NLo_w8dUvJckD7U-y1g
CODEN BOINFP
CitedBy_id crossref_primary_10_1016_j_jbi_2007_06_004
crossref_primary_10_1186_1471_2105_10_129
crossref_primary_10_1186_1471_2105_10_148
crossref_primary_10_1186_gb_2008_9_s2_s6
crossref_primary_10_1371_journal_pone_0115892
crossref_primary_10_1002_asi_21170
crossref_primary_10_1002_asi_23351
crossref_primary_10_1016_j_artmed_2012_08_006
crossref_primary_10_1186_s13326_017_0123_3
crossref_primary_10_1186_gb_2008_9_s2_s3
crossref_primary_10_1002_asi_23290
crossref_primary_10_1038_srep32252
crossref_primary_10_1197_jamia_M2431
crossref_primary_10_1186_1471_2105_9_S3_S9
crossref_primary_10_1016_j_jbi_2008_07_003
crossref_primary_10_1016_j_ijmedinf_2006_05_002
crossref_primary_10_1016_j_ymeth_2014_10_023
crossref_primary_10_1016_j_jbi_2015_04_013
crossref_primary_10_1371_journal_pone_0209961
crossref_primary_10_1186_1471_2105_9_S5_S3
crossref_primary_10_1016_j_websem_2011_11_009
crossref_primary_10_1016_j_irbm_2012_10_002
crossref_primary_10_1186_1471_2105_14_171
crossref_primary_10_1186_s13326_016_0073_1
crossref_primary_10_1186_1471_2105_10_313
crossref_primary_10_1007_s10278_015_9792_6
crossref_primary_10_1186_1471_2105_12_S8_S2
crossref_primary_10_2196_jmir_2043
crossref_primary_10_1093_bib_bbaa394
crossref_primary_10_1155_2023_2989791
crossref_primary_10_1002_asi_23435
crossref_primary_10_1002_asi_23972
crossref_primary_10_1007_s10257_014_0259_y
crossref_primary_10_1016_j_jbi_2017_07_011
crossref_primary_10_1002_jrsm_27
crossref_primary_10_1093_bioinformatics_btp249
crossref_primary_10_1186_1471_2105_14_208
crossref_primary_10_1155_2008_342746
crossref_primary_10_1136_amiajnl_2010_000055
crossref_primary_10_1186_1471_2105_9_S8_S2
crossref_primary_10_1186_s13326_016_0096_7
Cites_doi 10.1177/002383096500800404
10.1186/1471-2105-6-S1-S1
10.1186/1471-2105-6-S1-S23
10.1016/S1386-5056(02)00057-6
10.1101/gr.461403
10.1145/1067268.1067273
10.1162/coli.2003.29.2.328
10.1016/0020-0271(71)90024-6
10.1023/A:1007413511361
10.1016/0010-4825(95)00055-0
10.1186/1471-2105-4-20
10.3115/1072228.1072370
10.1007/978-3-540-31865-1_9
ContentType Journal Article
Copyright Copyright Oxford University Press(England) Mar 15, 2006
Copyright_xml – notice: Copyright Oxford University Press(England) Mar 15, 2006
DBID BSCLL
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
DOI 10.1093/bioinformatics/bti783
DatabaseName Istex
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
MEDLINE
CrossRef
Engineering Research Database
Materials Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1460-2059
1367-4811
EndPage 664
ExternalDocumentID 1006454661
16287934
10_1093_bioinformatics_bti783
ark_67375_HXZ_BCXB2DZW_9
Genre Evaluation Studies
Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GroupedDBID -~X
.2P
.I3
482
48X
5GY
AAMVS
ABGNP
ABJNI
ABPTD
ACGFS
ACUFI
ADZXQ
ALMA_UNASSIGNED_HOLDINGS
BSCLL
CZ4
EE~
F5P
F9B
H5~
HAR
HW0
IOX
KSI
KSN
NGC
Q5Y
RD5
ROZ
RXO
TLC
TN5
TOX
WH7
~91
---
-E4
.DC
.GJ
0R~
1TH
23N
2WC
4.4
53G
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
AAYXX
ABEJV
ABEUO
ABIXL
ABNGD
ABNKS
ABPQP
ABQLI
ABWST
ABXVV
ABZBJ
ACIWK
ACPRK
ACUKT
ACUXJ
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADMLS
ADOCK
ADPDF
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQPQ
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALTZX
ALUQC
AMNDL
APIBT
APWMN
ARIXL
ASPBG
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
CITATION
COF
CS3
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EJD
EMOBN
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
HVGLF
HZ~
J21
JXSIZ
KAQDR
KOP
KQ8
M-Z
MK~
ML0
N9A
NLBLG
NMDNZ
NOMLY
NTWIH
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
R44
RNS
ROL
ROX
RUSNO
RW1
SV3
TEORI
TJP
TR2
VH1
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
~KM
ABQTQ
ADRIX
AFXEN
BCRHZ
CGR
CUY
CVF
ECM
EIF
M49
NPM
RIG
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
ID FETCH-LOGICAL-c516t-4e31c6082874ff4c086b5b62703e9c79f937aa2225adcd15ff2aa5ce9f4219e03
ISICitedReferencesCount 68
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000236111600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1367-4803
IngestDate Fri Sep 05 07:17:58 EDT 2025
Mon Oct 06 18:05:53 EDT 2025
Mon Oct 06 17:26:49 EDT 2025
Wed Feb 19 01:43:22 EST 2025
Tue Nov 18 21:21:01 EST 2025
Sat Nov 29 05:33:27 EST 2025
Sat Sep 20 11:02:08 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c516t-4e31c6082874ff4c086b5b62703e9c79f937aa2225adcd15ff2aa5ce9f4219e03
Notes To whom correspondence should be addressed.
istex:D55620CA35BE6A1C4A5346A269529F8130D77ADC
ark:/67375/HXZ-BCXB2DZW-9
Associate Editor: Alfonso Valencia
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-1
ObjectType-Feature-3
OpenAccessLink https://academic.oup.com/bioinformatics/article-pdf/22/6/658/48839321/bioinformatics_22_6_658.pdf
PMID 16287934
PQID 198745368
PQPubID 36124
PageCount 7
ParticipantIDs proquest_miscellaneous_67732043
proquest_miscellaneous_19431727
proquest_journals_198745368
pubmed_primary_16287934
crossref_primary_10_1093_bioinformatics_bti783
crossref_citationtrail_10_1093_bioinformatics_bti783
istex_primary_ark_67375_HXZ_BCXB2DZW_9
PublicationCentury 2000
PublicationDate 2006-03-15
PublicationDateYYYYMMDD 2006-03-15
PublicationDate_xml – month: 03
  year: 2006
  text: 2006-03-15
  day: 15
PublicationDecade 2000
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Bioinformatics
PublicationTitleAlternate Bioinformatics
PublicationYear 2006
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Stolz (2023012408504954800_b32) 1965; 8
Ruch (2023012408504954800_b26) 2002; 67
Aronson (2023012408504954800_b4) 2005
Lewis (2023012408504954800_b20) 1996
Park (2023012408504954800_b22) 2002
Hersh (2023012408504954800_b13) 2005; 39
Singhal (2023012408504954800_b31) 1996
Wilbur (2023012408504954800_b33) 1996; 26
Amini (2023012408504954800_b2) 2005
de Bruijn (2023012408504954800_b8) 2003
Hersh (2023012408504954800_b14) 1994
Camon (2023012408504954800_b5) 2003; 13
Manber (2023012408504954800_b21) 1994
Yang (2023012408504954800_b36) 1999; 1
Rasolofo (2023012408504954800_b24) 2003
Ruch (2023012408504954800_b28) 2005
Couto (2023012408504954800_b7) 2004
Ehrler (2023012408504954800_b10) 2005; 6
Gaizauskas (2023012408504954800_b12) 2003; 29
Zobel (2023012408504954800_b38) 1998
Lewis (2023012408504954800_b19) 1995
Ruch (2023012408504954800_b27) 2000
Singhal (2023012408504954800_b30) 2001; 24
Larkey (2023012408504954800_b18) 1996
Shah (2023012408504954800_b29) 2003; 4
Funk (2023012408504954800_b11) 1983; 71
Cooper (2023012408504954800_b6) 1971; 7
Hirschman (2023012408504954800_b15) 2005; 6
Ruch (2023012408504954800_b25) 2002
Yang (2023012408504954800_b35) 1996
Pustejovsky (2023012408504954800_b23) 2001
Aldous (2023012408504954800_b1) 1985
Joachims (2023012408504954800_b16) 1999
Kim (2023012408504954800_b17) 2001
Yang (2023012408504954800_b37) 1992
Domingos (2023012408504954800_b9) 1997; 29
Yang (2023012408504954800_b34) 1996
Arampatzis (2023012408504954800_b3) 2000
References_xml – volume: 8
  year: 1965
  ident: 2023012408504954800_b32
  article-title: A probabilistic procedure for grouping words into phrases
  publication-title: Lang. Speech
  doi: 10.1177/002383096500800404
– start-page: 23
  year: 1994
  ident: 2023012408504954800_b21
  article-title: GLIMPSE: a tool to search through entire file systems
– volume-title: BioCreative Notebook Papers, CNB 2004
  year: 2004
  ident: 2023012408504954800_b7
  article-title: FIGO: findings GO terms in unStructured text
– volume: 6
  year: 2005
  ident: 2023012408504954800_b15
  article-title: Overview of BioCreAtIvE: critical assessment of information extraction for biology
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-6-S1-S1
– volume: 6
  year: 2005
  ident: 2023012408504954800_b10
  article-title: Data-poor Categorization and Passage Retrieval for Gene Ontology Annotation in Swiss-Prot
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-6-S1-S23
– start-page: 88
  year: 1996
  ident: 2023012408504954800_b35
  article-title: Sampling strategies and learning efficiency in text categorization
– volume: 67
  start-page: 75
  year: 2002
  ident: 2023012408504954800_b26
  article-title: Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records
  publication-title: Int. J. Med. Inf.
  doi: 10.1016/S1386-5056(02)00057-6
– volume: 13
  start-page: 562
  year: 2003
  ident: 2023012408504954800_b5
  article-title: The Gene Ontology Annotation (GOA) project: implementation of GO in Swiss-Prot, TrEMBL and InterPro
  publication-title: Genome Res.
  doi: 10.1101/gr.461403
– start-page: 447
  year: 1992
  ident: 2023012408504954800_b37
  article-title: A linear least squares fit mapping method for information retrieval from natural language texts
– start-page: 319
  year: 2001
  ident: 2023012408504954800_b17
  article-title: Automatic MeSH term assignment and quality assessment
  publication-title: Proc. AMIA Symp.
– start-page: 307
  year: 1998
  ident: 2023012408504954800_b38
  article-title: How reliable are large-scale information retrieval experiments?
– volume: 39
  start-page: 21
  year: 2005
  ident: 2023012408504954800_b13
  article-title: Report on the TREC 2004 Genomics track
  publication-title: SIGIR Forum
  doi: 10.1145/1067268.1067273
– volume: 29
  start-page: 328
  year: 2003
  ident: 2023012408504954800_b12
  article-title: Recent advances in computational terminology
  publication-title: Comput. Linguist.
  doi: 10.1162/coli.2003.29.2.328
– start-page: 246
  year: 1995
  ident: 2023012408504954800_b19
  article-title: Evaluating and optimizing autonomous text classification systems
– start-page: 1
  volume-title: Advances in Kernel Methods—Support Vector Learning
  year: 1999
  ident: 2023012408504954800_b16
  article-title: Making large-scale SVM learning practical
– volume: 24
  start-page: 35
  year: 2001
  ident: 2023012408504954800_b30
  article-title: Modern information retrieval: a brief overview
  publication-title: IEEE Data Eng. Bull.
– volume: 1
  start-page: 67
  year: 1999
  ident: 2023012408504954800_b36
  article-title: An evaluation of statistical approaches to text categorization
  publication-title: J. Inf. Ret.
– start-page: 36
  year: 2005
  ident: 2023012408504954800_b4
  article-title: Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents
– start-page: 101
  year: 2003
  ident: 2023012408504954800_b24
  article-title: Term proximity scoring for keyword-based retrieval systems
– volume: 71
  start-page: 176
  year: 1983
  ident: 2023012408504954800_b11
  article-title: Indexing consistency in medline
  publication-title: Bull Med. Libr. Assoc.
– year: 2001
  ident: 2023012408504954800_b23
  article-title: Extraction and disambiguation of acronym–meaning pairs in medline
– start-page: 21
  year: 1996
  ident: 2023012408504954800_b31
  article-title: Pivoted document length normalization
– volume: 7
  start-page: 19
  year: 1971
  ident: 2023012408504954800_b6
  article-title: A definition of relevance for information retrieval
  publication-title: Inf. Storage Retr.
  doi: 10.1016/0020-0271(71)90024-6
– start-page: 289
  year: 1996
  ident: 2023012408504954800_b18
  article-title: Combining classifiers in text categorization
– start-page: 298
  year: 1996
  ident: 2023012408504954800_b20
  article-title: Training algorithms for linear text classifiers
– volume-title: ACM-SAC Information Access and Retrieval Track
  year: 2002
  ident: 2023012408504954800_b25
  article-title: Information retrieval and spelling errors: improving effectiveness by lexical disambiguation
– start-page: 111
  year: 2000
  ident: 2023012408504954800_b27
  article-title: Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov models
– start-page: 192
  year: 1994
  ident: 2023012408504954800_b14
  article-title: OHSUMED: an interactive retrieval evaluation and new large test collection for research
– start-page: 358
  year: 1996
  ident: 2023012408504954800_b34
  article-title: An evaluation of statistical approaches to medline indexing
– start-page: 142
  year: 2005
  ident: 2023012408504954800_b2
  article-title: Automatic text summarization based on word-clusters and ranking algorithms
– volume: 29
  start-page: 103
  year: 1997
  ident: 2023012408504954800_b9
  article-title: On the optimality of the simple bayesian classifier under zero-one loss
  publication-title: Mach. Learn.
  doi: 10.1023/A:1007413511361
– start-page: 1
  volume-title: École d'été de probabilités de Saint-Flour, XIII—1983, Volume 1117 of Lecture Notes in Mathematics
  year: 1985
  ident: 2023012408504954800_b1
  article-title: Exchangeability and related topics
– volume: 26
  start-page: 209
  year: 1996
  ident: 2023012408504954800_b33
  article-title: An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts
  publication-title: Comput. Biol. Med.
  doi: 10.1016/0010-4825(95)00055-0
– start-page: 451
  year: 2003
  ident: 2023012408504954800_b8
  article-title: Finding gene functions using litminer
– volume: 4
  year: 2003
  ident: 2023012408504954800_b29
  article-title: Information extraction from full text scientific articles: where are the keywords?
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-4-20
– volume-title: Encyclopedia of Library and Information Science
  year: 2000
  ident: 2023012408504954800_b3
  article-title: Linguistically motivated information retrieval
– year: 2002
  ident: 2023012408504954800_b22
  article-title: Automatic glossary extraction: beyond terminology identification
  doi: 10.3115/1072228.1072370
– year: 2005
  ident: 2023012408504954800_b28
  article-title: Features combination for extracting gene functions from medline
  doi: 10.1007/978-3-540-31865-1_9
SSID ssj0051444
ssj0005056
Score 2.1895583
Snippet Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text....
We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual...
MOTIVATION: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text....
SourceID proquest
pubmed
crossref
istex
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 658
SubjectTerms Abstracting and Indexing as Topic - methods
Algorithms
Artificial Intelligence
Documentation - methods
MEDLINE
Natural Language Processing
Pattern Recognition, Automated - methods
Periodicals as Topic
Proteins - classification
Title Automatic assignment of biomedical categories: toward a generic approach
URI https://api.istex.fr/ark:/67375/HXZ-BCXB2DZW-9/fulltext.pdf
https://www.ncbi.nlm.nih.gov/pubmed/16287934
https://www.proquest.com/docview/198745368
https://www.proquest.com/docview/19431727
https://www.proquest.com/docview/67732043
Volume 22
WOSCitedRecordID wos000236111600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 20220930
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1460-2059
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005056
  issn: 1367-4803
  databaseCode: TOX
  dateStart: 19850101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfYBhKXiW_C2PABcUFhSWzHMbd9gHpAg0OBaJfIdWypAiVTm6Ly388vdpIGVBgHLlHrOpH73s_P78XPv4fQSx0ZmXDGw1grEdJEi9B64SpkOlKC2RUwabPdv3zgFxdZnotPvrzVsi0nwKsqW6_F1X9VtW2zyoajs_-g7v6htsF-tkq3V6t2e72R4k9WTe15WJeQndHt9ruD9o4QBOghagiS27SONnP2tYRqym1ifcczPtrwndeeY7Xp8uM33hWQ0J2W_NMZxA3zR4AFPYucydGujaaRFbDn7fY2M0k2sLFpAFNHxO7X0tQxlP9mph2F1Ww0cmho5jwjw8rU7cb_smD1aYRy8Q3y0jgrJvllcXqWnybnl18LsYP2LP4EZPhNP-ZDxk8EvEHui_UTqSt47P9xd7pLkOPxuI7dqEZ-yx5MwfX2oKR1Tqb30L6PKvCJQ8N9dEtXD9AdV2f050M06TGBB0zg2uABE3jAxFvsEIEl9ojAHSIeoc_v303PJqEvoREqFqdNSDWJVeqqGhhDlQ1gZ2yWJtbOa6G4MNY7lRJiflmqMmbGJFIypYWhdinTEXmMdqu60k8RNqokmkBISigVpJypODYQfhpWaklVgGgnoEJ5fnkoc_K9cHkOpBjLtXByDdCb_rYrR7DytxtetdLve29DQYAOOvUUfn4uC3jHRhlJswC96H-1FhW2yWSl6xV0Aac64dt7pJwTOFMeoCdO68PIUytpQeizmw7yAN0dJupztNssVvoQ3VY_mvlycYR2eJ4dtSi-BiD_slc
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+assignment+of+biomedical+categories%3A+toward+a+generic+approach&rft.jtitle=Bioinformatics&rft.date=2006-03-15&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.eissn=1460-2059&rft.volume=22&rft.issue=6&rft.spage=658&rft.epage=664&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbti783&rft.externalDBID=n%2Fa&rft.externalDocID=ark_67375_HXZ_BCXB2DZW_9
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon