Feature engineering for MEDLINE citation categorization with MeSH

Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representa...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics Vol. 16; no. 1; p. 113
Main Authors: Jimeno Yepes, Antonio Jose, Plaza, Laura, Carrillo-de-Albornoz, Jorge, Mork, James G, Aronson, Alan R
Format: Journal Article
Language:English
Published: London BioMed Central 08.04.2015
BioMed Central Ltd
Subjects:
ISSN:1471-2105, 1471-2105
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Results Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. Conclusions We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
AbstractList Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Results Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. Conclusions We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Results Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. Conclusions We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. Keywords: Text categorization, Feature engineering, Biomedical literature, MeSH indexing
BACKGROUNDResearch in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.RESULTSTraditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance.CONCLUSIONSWe conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system.
ArticleNumber 113
Audience Academic
Author Aronson, Alan R
Mork, James G
Jimeno Yepes, Antonio Jose
Carrillo-de-Albornoz, Jorge
Plaza, Laura
Author_xml – sequence: 1
  givenname: Antonio Jose
  surname: Jimeno Yepes
  fullname: Jimeno Yepes, Antonio Jose
  email: antonio.jimeno@gmail.com
  organization: Department of Computing and Information Systems, The University of Melbourne, National Library of Medicine
– sequence: 2
  givenname: Laura
  surname: Plaza
  fullname: Plaza, Laura
  organization: UNED NLP & IR Group
– sequence: 3
  givenname: Jorge
  surname: Carrillo-de-Albornoz
  fullname: Carrillo-de-Albornoz, Jorge
  organization: UNED NLP & IR Group
– sequence: 4
  givenname: James G
  surname: Mork
  fullname: Mork, James G
  organization: National Library of Medicine
– sequence: 5
  givenname: Alan R
  surname: Aronson
  fullname: Aronson, Alan R
  organization: National Library of Medicine
BackLink https://www.ncbi.nlm.nih.gov/pubmed/25887792$$D View this record in MEDLINE/PubMed
BookMark eNp9kk1r3DAQhkVJaD7aH9BLMfTSHpxKtvXhS2FJNsnCpoWmPQutPHIUvNJWktOPX18tTkK2lKDDjKTnHWaY9wjtOe8AoTcEnxAi2MdIKkHbEhNaYlq3JX-BDknDSVkRTPee5AfoKMZbjAkXmL5EBxUVgvO2OkSzc1BpDFCA660DCNb1hfGhuJqfLRef54W2SSXrXaFVgt4H-2e6_rTppriC68tXaN-oIcLr-3iMvp_Pv51elssvF4vT2bLUjNSpNFUDVNUtEKorZnIrq05VjWk5W_EV03VrgHcdrnPWaMYIwy2jtcY5UmFMfYw-TXU342oNnQaXghrkJti1Cr-lV1bu_jh7I3t_J5sG87oiucD7-wLB_xghJrm2UcMwKAd-jJIwTgVrBRUZfTehvRpAWmd8rqi3uJzRhtRUcEozdfIfKp8O1lbnVRmb33cEH3YEmUnwK_VqjFEurr_usm-fjvs458PqMkAmQAcfYwDziBAst_aQkz1ktofc2kPyrOH_aB7Wmzu3w7PKalLGzdYiEOStH4PLC39G9Bean8t1
CitedBy_id crossref_primary_10_1007_s40708_016_0053_3
crossref_primary_10_1124_pr_119_017921
crossref_primary_10_1016_j_jbi_2017_08_001
crossref_primary_10_1109_ACCESS_2024_3463717
crossref_primary_10_1177_0165551519860982
crossref_primary_10_1371_journal_pone_0209961
crossref_primary_10_2196_medinform_7059
crossref_primary_10_1371_journal_pone_0207996
Cites_doi 10.1007/BFb0026683
10.1093/bioinformatics/bth227
10.1093/nar/gkh061
10.1186/1471-2105-14-171
10.3115/1273073.1273160
10.1145/2110363.2110450
10.1145/505282.505283
10.6028/NIST.SP.500-255.genomics-u.hospitalgeneva
10.1016/j.ijmedinf.2005.06.007
10.1145/183422.183423
10.3115/1118693.1118704
10.3163/1536-5050.99.2.009
10.1186/1471-2105-14-113
10.1007/978-1-4471-2099-5_20
10.1016/S0306-4573(01)00045-0
10.1197/jamia.M2431
10.1136/jamia.2009.002733
10.5626/JCSE.2012.6.2.151
10.1186/1471-2105-8-423
10.1145/288627.288651
10.1016/j.ijmedinf.2011.02.008
10.1016/j.ijmedinf.2006.05.002
10.1016/j.artmed.2011.06.005
10.1145/215206.215365
10.1186/1471-2105-9-S11-S11
10.1186/1471-2105-14-71
10.1145/1102351.1102399
10.1109/ITA.2013.114
ContentType Journal Article
Copyright Jimeno Yepes et al.; licensee BioMed Central. 2015
COPYRIGHT 2015 BioMed Central Ltd.
Copyright_xml – notice: Jimeno Yepes et al.; licensee BioMed Central. 2015
– notice: COPYRIGHT 2015 BioMed Central Ltd.
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISR
7X8
5PM
DOI 10.1186/s12859-015-0539-7
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Science
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList



MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1471-2105
EndPage 113
ExternalDocumentID PMC4407321
A541358755
25887792
10_1186_s12859_015_0539_7
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Intramural
GrantInformation_xml – fundername: Intramural NIH HHS
GroupedDBID ---
0R~
23N
2WC
4.4
53G
5VS
6J9
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKPC
AASML
ABDBF
ABUWG
ACGFO
ACGFS
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADMLS
ADRAZ
ADUKV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHBYD
AHMBA
AHSBF
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMTXH
AOIJS
ARAPS
AZQEC
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGLVJ
BHPHI
BMC
BPHCQ
BVXVI
C6C
CCPQU
CS3
DIK
DU5
DWQXO
E3Z
EAD
EAP
EAS
EBD
EBLON
EBS
EJD
EMB
EMK
EMOBN
ESX
F5P
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
H13
HCIFZ
HMCUK
HYE
IAO
ICD
IHR
INH
INR
ISR
ITC
K6V
K7-
KQ8
LK8
M1P
M48
M7P
MK~
ML0
M~E
O5R
O5S
OK1
OVT
P2P
P62
PGMZT
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PUEGO
RBZ
RNS
ROL
RPM
RSV
SBL
SOJ
SV3
TR2
TUS
UKHRP
W2D
WOQ
WOW
XH6
XSB
AAYXX
AFFHD
CITATION
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c613t-f24e5a39e15c26f017bda24f976b7b6c39fe7dd03c394c661609653c060958ff3
IEDL.DBID RSV
ISICitedReferencesCount 14
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000353260300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1471-2105
IngestDate Tue Nov 04 01:49:49 EST 2025
Thu Sep 04 20:28:21 EDT 2025
Tue Nov 11 11:02:37 EST 2025
Tue Nov 04 18:20:29 EST 2025
Thu Nov 13 16:40:19 EST 2025
Mon Jul 21 06:05:31 EDT 2025
Tue Nov 18 22:20:04 EST 2025
Sat Nov 29 05:39:57 EST 2025
Sat Sep 06 07:27:17 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords MeSH indexing
Text categorization
Feature engineering
Biomedical literature
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c613t-f24e5a39e15c26f017bda24f976b7b6c39fe7dd03c394c661609653c060958ff3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://link.springer.com/10.1186/s12859-015-0539-7
PMID 25887792
PQID 1675869858
PQPubID 23479
PageCount 1
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_4407321
proquest_miscellaneous_1675869858
gale_infotracmisc_A541358755
gale_infotracacademiconefile_A541358755
gale_incontextgauss_ISR_A541358755
pubmed_primary_25887792
crossref_primary_10_1186_s12859_015_0539_7
crossref_citationtrail_10_1186_s12859_015_0539_7
springer_journals_10_1186_s12859_015_0539_7
PublicationCentury 2000
PublicationDate 2015-04-08
PublicationDateYYYYMMDD 2015-04-08
PublicationDate_xml – month: 04
  year: 2015
  text: 2015-04-08
  day: 08
PublicationDecade 2010
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle BMC bioinformatics
PublicationTitleAbbrev BMC Bioinformatics
PublicationTitleAlternate BMC Bioinformatics
PublicationYear 2015
Publisher BioMed Central
BioMed Central Ltd
Publisher_xml – name: BioMed Central
– name: BioMed Central Ltd
References DD Lewis (539_CR17) 1992
539_CR43
539_CR44
539_CR42
539_CR47
539_CR48
539_CR45
539_CR46
539_CR40
WW Cohen (539_CR31) 1995; 32
S Sohn (539_CR22) 2008; 15
C Apte (539_CR12) 1994; 12
539_CR18
539_CR19
539_CR16
P Ruch (539_CR7) 2007; 76
539_CR10
539_CR54
539_CR11
539_CR55
539_CR52
539_CR53
539_CR14
539_CR58
539_CR15
AR Aronson (539_CR21) 2004; 11
539_CR13
539_CR57
O Bodenreider (539_CR4) 2004; 32
539_CR50
539_CR51
CM Tan (539_CR38) 2002; 38
L Plaza (539_CR49) 2011; 53
539_CR2
539_CR3
539_CR29
539_CR5
539_CR6
539_CR28
539_CR8
539_CR9
539_CR20
539_CR25
AR Aronson (539_CR34) 2010; 17
539_CR26
539_CR23
539_CR24
539_CR39
A Jimeno-Yepes (539_CR56) 2012; 6
L Smith (539_CR41) 2004; 20
539_CR32
539_CR33
539_CR30
539_CR36
539_CR37
539_CR35
JR Herskovica (539_CR27) 2011; 80
F Sebastiani (539_CR1) 2002; 34
9929332 - Proc AMIA Symp. 1998;:815-9
17971238 - BMC Bioinformatics. 2007;8:423
15073016 - Bioinformatics. 2004 Sep 22;20(14):2320-1
16815739 - Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200
14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
22195224 - AMIA Annu Symp Proc. 2011;2011:1583-92
16779160 - AMIA Annu Symp Proc. 2005;:849-53
23725347 - BMC Bioinformatics. 2013;14:171
8563418 - Proc Annu Symp Comput Appl Med Care. 1995;:878-82
19025687 - BMC Bioinformatics. 2008;9 Suppl 11:S11
15360816 - Stud Health Technol Inform. 2004;107(Pt 1):268-72
23537461 - BMC Bioinformatics. 2013;14:113
16165395 - Int J Med Inform. 2006 Jun;75(6):488-95
23445074 - BMC Bioinformatics. 2013;14:71
16779043 - AMIA Annu Symp Proc. 2005;:266-70
20442139 - J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36
11825149 - Proc AMIA Symp. 2001;:17-21
21752612 - Artif Intell Med. 2011 Sep;53(1):1-14
21439897 - Int J Med Inform. 2011 Jun;80(6):431-41
18436913 - J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53
24551371 - AMIA Annu Symp Proc. 2013;2013:709-18
21464855 - J Med Libr Assoc. 2011 Apr;99(2):160-3
References_xml – ident: 539_CR51
  doi: 10.1007/BFb0026683
– volume: 20
  start-page: 2320
  issue: 14
  year: 2004
  ident: 539_CR41
  publication-title: Bioinformatics (Oxford, England)
  doi: 10.1093/bioinformatics/bth227
– volume: 32
  start-page: D267
  issue: suppl 1
  year: 2004
  ident: 539_CR4
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkh061
– ident: 539_CR9
  doi: 10.1186/1471-2105-14-171
– ident: 539_CR29
– ident: 539_CR6
  doi: 10.3115/1273073.1273160
– ident: 539_CR3
– ident: 539_CR40
– ident: 539_CR25
– ident: 539_CR2
  doi: 10.1145/2110363.2110450
– ident: 539_CR11
– ident: 539_CR30
– volume: 34
  start-page: 1
  year: 2002
  ident: 539_CR1
  publication-title: ACM Comput Surveys (CSUR)
  doi: 10.1145/505282.505283
– volume-title: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92
  year: 1992
  ident: 539_CR17
– ident: 539_CR45
  doi: 10.6028/NIST.SP.500-255.genomics-u.hospitalgeneva
– ident: 539_CR57
– ident: 539_CR19
– ident: 539_CR46
  doi: 10.1016/j.ijmedinf.2005.06.007
– volume: 12
  start-page: 233
  year: 1994
  ident: 539_CR12
  publication-title: ACM Trans Inf Syst
  doi: 10.1145/183422.183423
– ident: 539_CR18
  doi: 10.3115/1118693.1118704
– ident: 539_CR20
– ident: 539_CR28
– volume: 32
  start-page: 124
  year: 1995
  ident: 539_CR31
  publication-title: Advances in inductive logic programming
– ident: 539_CR48
  doi: 10.3163/1536-5050.99.2.009
– ident: 539_CR10
– ident: 539_CR26
  doi: 10.1186/1471-2105-14-113
– ident: 539_CR24
  doi: 10.1007/978-1-4471-2099-5_20
– volume: 11
  start-page: 268
  issue: Pt 1
  year: 2004
  ident: 539_CR21
  publication-title: Medinfo
– ident: 539_CR14
– ident: 539_CR39
– ident: 539_CR35
– ident: 539_CR52
– ident: 539_CR23
– volume: 38
  start-page: 529
  issue: 4
  year: 2002
  ident: 539_CR38
  publication-title: Inf Process Manage
  doi: 10.1016/S0306-4573(01)00045-0
– volume: 15
  start-page: 546
  issue: 4
  year: 2008
  ident: 539_CR22
  publication-title: J Am Med Informatics Assoc
  doi: 10.1197/jamia.M2431
– volume: 17
  start-page: 229
  issue: 3
  year: 2010
  ident: 539_CR34
  publication-title: J Am Med Informatics Assoc
  doi: 10.1136/jamia.2009.002733
– ident: 539_CR42
– ident: 539_CR5
– ident: 539_CR8
– volume: 6
  start-page: 151
  issue: 2
  year: 2012
  ident: 539_CR56
  publication-title: JCSE
  doi: 10.5626/JCSE.2012.6.2.151
– ident: 539_CR36
  doi: 10.1186/1471-2105-8-423
– ident: 539_CR55
– ident: 539_CR13
  doi: 10.1145/288627.288651
– volume: 80
  start-page: 431
  issue: 6
  year: 2011
  ident: 539_CR27
  publication-title: Int J Med Informatics
  doi: 10.1016/j.ijmedinf.2011.02.008
– volume: 76
  start-page: 195
  issue: 2
  year: 2007
  ident: 539_CR7
  publication-title: Int J Med Informatics
  doi: 10.1016/j.ijmedinf.2006.05.002
– volume: 53
  start-page: 1
  year: 2011
  ident: 539_CR49
  publication-title: Artif Intelligence Med
  doi: 10.1016/j.artmed.2011.06.005
– ident: 539_CR15
  doi: 10.1145/215206.215365
– ident: 539_CR32
  doi: 10.1186/1471-2105-9-S11-S11
– ident: 539_CR47
  doi: 10.1186/1471-2105-14-71
– ident: 539_CR43
– ident: 539_CR53
  doi: 10.1145/1102351.1102399
– ident: 539_CR58
– ident: 539_CR54
– ident: 539_CR44
  doi: 10.1109/ITA.2013.114
– ident: 539_CR33
– ident: 539_CR16
– ident: 539_CR37
– ident: 539_CR50
– reference: 9929332 - Proc AMIA Symp. 1998;:815-9
– reference: 15073016 - Bioinformatics. 2004 Sep 22;20(14):2320-1
– reference: 21439897 - Int J Med Inform. 2011 Jun;80(6):431-41
– reference: 8563418 - Proc Annu Symp Comput Appl Med Care. 1995;:878-82
– reference: 16815739 - Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200
– reference: 11825149 - Proc AMIA Symp. 2001;:17-21
– reference: 23725347 - BMC Bioinformatics. 2013;14:171
– reference: 21752612 - Artif Intell Med. 2011 Sep;53(1):1-14
– reference: 21464855 - J Med Libr Assoc. 2011 Apr;99(2):160-3
– reference: 16779160 - AMIA Annu Symp Proc. 2005;:849-53
– reference: 24551371 - AMIA Annu Symp Proc. 2013;2013:709-18
– reference: 18436913 - J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53
– reference: 17971238 - BMC Bioinformatics. 2007;8:423
– reference: 15360816 - Stud Health Technol Inform. 2004;107(Pt 1):268-72
– reference: 16165395 - Int J Med Inform. 2006 Jun;75(6):488-95
– reference: 23537461 - BMC Bioinformatics. 2013;14:113
– reference: 19025687 - BMC Bioinformatics. 2008;9 Suppl 11:S11
– reference: 23445074 - BMC Bioinformatics. 2013;14:71
– reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
– reference: 22195224 - AMIA Annu Symp Proc. 2011;2011:1583-92
– reference: 20442139 - J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36
– reference: 16779043 - AMIA Annu Symp Proc. 2005;:266-70
SSID ssj0017805
Score 2.2485871
Snippet Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based...
Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on...
Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using...
Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based...
BACKGROUNDResearch in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on...
SourceID pubmedcentral
proquest
gale
pubmed
crossref
springer
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 113
SubjectTerms Abstracting and Indexing as Topic - methods
Algorithms
Analysis
Artificial Intelligence
Bioinformatics
Biomedical and Life Sciences
Comparative analysis
Computational Biology/Bioinformatics
Computer Appl. in Life Sciences
Data mining
Humans
Information Storage and Retrieval
Knowledge-based analysis
Life Sciences
Medical Subject Headings
MEDLINE
Microarrays
Research Article
Semantics
Title Feature engineering for MEDLINE citation categorization with MeSH
URI https://link.springer.com/article/10.1186/s12859-015-0539-7
https://www.ncbi.nlm.nih.gov/pubmed/25887792
https://www.proquest.com/docview/1675869858
https://pubmed.ncbi.nlm.nih.gov/PMC4407321
Volume 16
WOSCitedRecordID wos000353260300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVADU
  databaseName: BioMed Central Open Access Free
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RBZ
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.biomedcentral.com/search/
  providerName: BioMedCentral
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: DOA
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M~E
  dateStart: 20000101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: P5Z
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Biological Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: M7P
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/biologicalscijournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: K7-
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: 7X7
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: BENPR
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: PIMPY
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1471-2105
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017805
  issn: 1471-2105
  databaseCode: RSV
  dateStart: 20001201
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3di9QwEB-8OwVf_P6onqWKICjBbZs26eOqe9zi7VJ2VVZfQpsm54F0Zbsr-N87k21Xu6igL6GQaZtOJ_ORzPwC8LTKUDPGpWVUtci45AUrLTeUuIb_2BqZlq5Q-ExMp3KxyPK2jrvpst27LUmnqd20lunLJiSsNQx9E4aCkzFxAEdo7SSd1zCbf9htHRBIf7t9-dvbegZoXw3_Yof2cyT3Nkqd_Tm5_l8jvwHXWnczGG7l4yZcMvUtuLI9gPL7bRiSB7hZmcD8hCUM0I0NJqM3Z-PpKNAthHdAiVPny1VbtRnQ8m0wMfPTO_D-ZPTu9SlrD1VgGi33mtmIm6SIMxMmOkot8qusiohbdEtKUaY6zqwRVTWI8YprtN4p4cPEekDIdNLa-C4c1sva3IeAcA2FrdAhTDXXpc5sYUNehPhIQzD8Hgw6TqtuuHTwxRflIg-Zqi1nFHJGEWeU8OD57pavW7iNvxE_od-nCMaipjyZ82LTNGo8n6lhgiKYYCyWePCsJbJLfLku2rID_ARCvupRHvcocZ7pXvfjTkoUdVFyWm2Wm0aFFHSlmUykB_e2UrMbfJSgFhdZ5IHoydOOgOC9-z31xWcH880x1o6j0IMXnVSpVr80f-bJg3-ifghXIyeWnA3kMRyuVxvzCC7rb-uLZuXDgVgI10ofjl6NpvnMd2sW2L4VzKc82RzbPPmE_fl4kn_03Vz8AQjUJVE
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwED_BNgQv-4AB2QYYhIQEstYkTuI8VtCp09oKrQPtzUoce5s0pahpkfjvd5c4hVSABG-RfGncy8_3Ed_9DPC2SNEyhrnl1LXIhRQZz60wVLiG79gaGed1o_AomUzk5WX62fVxV221e7slWVvqelnL-LjyiWsNU9-II3BSntyHTYEOiwjzz6dfV1sHRNLvti9_e1vHAa2b4V_80HqN5NpGae1_Tnb-a-a7sO3CTdZv8LEH90z5GB40B1D-eAJ9igCXc8PMT1pChmEsGw8-jU4nA6YdhTejwqmr2dx1bTL6fMvGZjrchy8ng4uPQ-4OVeAaPfeC20CYKAtT40c6iC3qKy-yQFgMS_Ikj3WYWpMURS_EK6HRe8fEDxPqHjHTSWvDp7BRzkrzHBjxGia2wIAw1kLnOrWZ9UXm408aouH3oNdqWrXTpYMvblWdechYNZpRqBlFmlGJB-9Xt3xr6Db-JvyGXp8iGouS6mSusmVVqdPpuepHCMEIc7HIg3dOyM7w4TpzbQf4F4j5qiN51JHEdaY7w69blCgaouK00syWlfIp6YpTGUkPnjWoWU0-iNCKJ2ngQdLB00qA6L27I-XNdU3zLTDXDgPfgw8tqpSzL9WfdXLwT9Kv4OHwYjxSiKizQ3gU1BAVvCePYGMxX5oXsKW_L26q-ct6jd0BxLQfYA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3da9swED-2rht72dp9euuHOgqDDdHYlm35MawJLW1DadrRN2HLUlcYTomTwf773dlyOoduUPpm0NmWTz_p7qy7nwB2ixRXxjC3nKoWuZAi47kVhhLXcIytkXFeFwofJ6ORvLxMT905p1Wb7d5uSTY1DcTSVM72bgrbTHEZ71U-8a5hGBxxBFHKk8fwRFAePYXr4--LbQQi7HdbmXfe1jFGy0vyXzZpOV9yadO0tkXDlw_-ijV44dxQ1m9wsw6PTPkKnjYHU_5-DX3yDOdTw8wtXSFD95adDPaPD0cDph21N6OEqqvJ1FVzMvqty07M-OANXAwH598OuDtsgWu06DNuA2GiLEyNH-kgtqi7vMgCYdFdyZM81mFqTVIUvRCvhEarHhNvTKh7xFgnrQ3fwko5Kc17YMR3mNgCHcVYC53r1GbWF5mPjzREz-9Br9W6artLB2L8VHVEImPVaEahZhRpRiUefFncctPQcPxP-BMNpSJ6i5LyZ66yeVWpw_GZ6kcIzQhjtMiDz07ITvDlOnPlCPgJxIjVkdzoSOL8053mnRYxipooaa00k3mlfArG4lRG0oN3DYIWnQ8iXN2TNPAg6WBrIUC0392W8vpHTf8tMAYPA9-Dry3ClFt3qn_r5MO9pLfh2en-UCGgjj7C86BGqOA9uQErs-ncbMKq_jW7rqZb9XT7A7_DKEQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Feature+engineering+for+MEDLINE+citation+categorization+with+MeSH&rft.jtitle=BMC+bioinformatics&rft.au=Jimeno+Yepes%2C+Antonio+Jose&rft.au=Plaza%2C+Laura&rft.au=Carrillo-de-Albornoz%2C+Jorge&rft.au=Mork%2C+James+G&rft.date=2015-04-08&rft.pub=BioMed+Central+Ltd&rft.issn=1471-2105&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0539-7&rft.externalDocID=A541358755
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon