Feature engineering for MEDLINE citation categorization with MeSH
Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representa...
Saved in:
| Published in: | BMC bioinformatics Vol. 16; no. 1; p. 113 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
London
BioMed Central
08.04.2015
BioMed Central Ltd |
| Subjects: | |
| ISSN: | 1471-2105, 1471-2105 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Background
Research in biomedical text categorization has mostly used the
bag-of-words
representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.
Results
Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance.
Conclusions
We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the
bag-of-words
representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. |
|---|---|
| AbstractList | Background
Research in biomedical text categorization has mostly used the
bag-of-words
representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.
Results
Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance.
Conclusions
We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the
bag-of-words
representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Results Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. Conclusions We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. Keywords: Text categorization, Feature engineering, Biomedical literature, MeSH indexing BACKGROUNDResearch in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations.RESULTSTraditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance.CONCLUSIONSWe conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using meta-data or citation structure. Noun phrases are too sparse and thus have lower performance compared to more traditional features. Conceptual annotation of the texts by MetaMap shows similar performance compared to unigrams, but adding concepts from the UMLS taxonomy does not improve the performance of using only mapped concepts. The combination of all the features performs largely better than any individual feature set considered. In addition, this combination improves the performance of a state-of-the-art MeSH indexer. Concerning the machine learning algorithms, we find that those that are more resilient to class imbalance largely obtain better performance. We conclude that even though traditional features such as unigrams and bigrams have strong performance compared to other features, it is possible to combine them to effectively improve the performance of the bag-of-words representation. We have also found that the combination of the learning algorithm and feature sets has an influence in the overall performance of the system. Moreover, using learning algorithms resilient to class imbalance largely improves performance. However, when using a large set of features, consideration needs to be taken with algorithms due to the risk of over-fitting. Specific combinations of learning algorithms and features for individual MeSH headings could further increase the performance of an indexing system. |
| ArticleNumber | 113 |
| Audience | Academic |
| Author | Aronson, Alan R Mork, James G Jimeno Yepes, Antonio Jose Carrillo-de-Albornoz, Jorge Plaza, Laura |
| Author_xml | – sequence: 1 givenname: Antonio Jose surname: Jimeno Yepes fullname: Jimeno Yepes, Antonio Jose email: antonio.jimeno@gmail.com organization: Department of Computing and Information Systems, The University of Melbourne, National Library of Medicine – sequence: 2 givenname: Laura surname: Plaza fullname: Plaza, Laura organization: UNED NLP & IR Group – sequence: 3 givenname: Jorge surname: Carrillo-de-Albornoz fullname: Carrillo-de-Albornoz, Jorge organization: UNED NLP & IR Group – sequence: 4 givenname: James G surname: Mork fullname: Mork, James G organization: National Library of Medicine – sequence: 5 givenname: Alan R surname: Aronson fullname: Aronson, Alan R organization: National Library of Medicine |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25887792$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kk1r3DAQhkVJaD7aH9BLMfTSHpxKtvXhS2FJNsnCpoWmPQutPHIUvNJWktOPX18tTkK2lKDDjKTnHWaY9wjtOe8AoTcEnxAi2MdIKkHbEhNaYlq3JX-BDknDSVkRTPee5AfoKMZbjAkXmL5EBxUVgvO2OkSzc1BpDFCA660DCNb1hfGhuJqfLRef54W2SSXrXaFVgt4H-2e6_rTppriC68tXaN-oIcLr-3iMvp_Pv51elssvF4vT2bLUjNSpNFUDVNUtEKorZnIrq05VjWk5W_EV03VrgHcdrnPWaMYIwy2jtcY5UmFMfYw-TXU342oNnQaXghrkJti1Cr-lV1bu_jh7I3t_J5sG87oiucD7-wLB_xghJrm2UcMwKAd-jJIwTgVrBRUZfTehvRpAWmd8rqi3uJzRhtRUcEozdfIfKp8O1lbnVRmb33cEH3YEmUnwK_VqjFEurr_usm-fjvs458PqMkAmQAcfYwDziBAst_aQkz1ktofc2kPyrOH_aB7Wmzu3w7PKalLGzdYiEOStH4PLC39G9Bean8t1 |
| CitedBy_id | crossref_primary_10_1007_s40708_016_0053_3 crossref_primary_10_1124_pr_119_017921 crossref_primary_10_1016_j_jbi_2017_08_001 crossref_primary_10_1109_ACCESS_2024_3463717 crossref_primary_10_1177_0165551519860982 crossref_primary_10_1371_journal_pone_0209961 crossref_primary_10_2196_medinform_7059 crossref_primary_10_1371_journal_pone_0207996 |
| Cites_doi | 10.1007/BFb0026683 10.1093/bioinformatics/bth227 10.1093/nar/gkh061 10.1186/1471-2105-14-171 10.3115/1273073.1273160 10.1145/2110363.2110450 10.1145/505282.505283 10.6028/NIST.SP.500-255.genomics-u.hospitalgeneva 10.1016/j.ijmedinf.2005.06.007 10.1145/183422.183423 10.3115/1118693.1118704 10.3163/1536-5050.99.2.009 10.1186/1471-2105-14-113 10.1007/978-1-4471-2099-5_20 10.1016/S0306-4573(01)00045-0 10.1197/jamia.M2431 10.1136/jamia.2009.002733 10.5626/JCSE.2012.6.2.151 10.1186/1471-2105-8-423 10.1145/288627.288651 10.1016/j.ijmedinf.2011.02.008 10.1016/j.ijmedinf.2006.05.002 10.1016/j.artmed.2011.06.005 10.1145/215206.215365 10.1186/1471-2105-9-S11-S11 10.1186/1471-2105-14-71 10.1145/1102351.1102399 10.1109/ITA.2013.114 |
| ContentType | Journal Article |
| Copyright | Jimeno Yepes et al.; licensee BioMed Central. 2015 COPYRIGHT 2015 BioMed Central Ltd. |
| Copyright_xml | – notice: Jimeno Yepes et al.; licensee BioMed Central. 2015 – notice: COPYRIGHT 2015 BioMed Central Ltd. |
| DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM ISR 7X8 5PM |
| DOI | 10.1186/s12859-015-0539-7 |
| DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Science MEDLINE - Academic PubMed Central (Full Participant titles) |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1471-2105 |
| EndPage | 113 |
| ExternalDocumentID | PMC4407321 A541358755 25887792 10_1186_s12859_015_0539_7 |
| Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Intramural |
| GrantInformation_xml | – fundername: Intramural NIH HHS |
| GroupedDBID | --- 0R~ 23N 2WC 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC AASML ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACUHS ADBBV ADMLS ADRAZ ADUKV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C6C CCPQU CS3 DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBLON EBS EJD EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 H13 HCIFZ HMCUK HYE IAO ICD IHR INH INR ISR ITC K6V K7- KQ8 LK8 M1P M48 M7P MK~ ML0 M~E O5R O5S OK1 OVT P2P P62 PGMZT PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PUEGO RBZ RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX AFFHD CITATION ALIPV CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c613t-f24e5a39e15c26f017bda24f976b7b6c39fe7dd03c394c661609653c060958ff3 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 14 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000353260300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1471-2105 |
| IngestDate | Tue Nov 04 01:49:49 EST 2025 Thu Sep 04 20:28:21 EDT 2025 Tue Nov 11 11:02:37 EST 2025 Tue Nov 04 18:20:29 EST 2025 Thu Nov 13 16:40:19 EST 2025 Mon Jul 21 06:05:31 EDT 2025 Tue Nov 18 22:20:04 EST 2025 Sat Nov 29 05:39:57 EST 2025 Sat Sep 06 07:27:17 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | MeSH indexing Text categorization Feature engineering Biomedical literature |
| Language | English |
| License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c613t-f24e5a39e15c26f017bda24f976b7b6c39fe7dd03c394c661609653c060958ff3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://link.springer.com/10.1186/s12859-015-0539-7 |
| PMID | 25887792 |
| PQID | 1675869858 |
| PQPubID | 23479 |
| PageCount | 1 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_4407321 proquest_miscellaneous_1675869858 gale_infotracmisc_A541358755 gale_infotracacademiconefile_A541358755 gale_incontextgauss_ISR_A541358755 pubmed_primary_25887792 crossref_primary_10_1186_s12859_015_0539_7 crossref_citationtrail_10_1186_s12859_015_0539_7 springer_journals_10_1186_s12859_015_0539_7 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-04-08 |
| PublicationDateYYYYMMDD | 2015-04-08 |
| PublicationDate_xml | – month: 04 year: 2015 text: 2015-04-08 day: 08 |
| PublicationDecade | 2010 |
| PublicationPlace | London |
| PublicationPlace_xml | – name: London – name: England |
| PublicationTitle | BMC bioinformatics |
| PublicationTitleAbbrev | BMC Bioinformatics |
| PublicationTitleAlternate | BMC Bioinformatics |
| PublicationYear | 2015 |
| Publisher | BioMed Central BioMed Central Ltd |
| Publisher_xml | – name: BioMed Central – name: BioMed Central Ltd |
| References | DD Lewis (539_CR17) 1992 539_CR43 539_CR44 539_CR42 539_CR47 539_CR48 539_CR45 539_CR46 539_CR40 WW Cohen (539_CR31) 1995; 32 S Sohn (539_CR22) 2008; 15 C Apte (539_CR12) 1994; 12 539_CR18 539_CR19 539_CR16 P Ruch (539_CR7) 2007; 76 539_CR10 539_CR54 539_CR11 539_CR55 539_CR52 539_CR53 539_CR14 539_CR58 539_CR15 AR Aronson (539_CR21) 2004; 11 539_CR13 539_CR57 O Bodenreider (539_CR4) 2004; 32 539_CR50 539_CR51 CM Tan (539_CR38) 2002; 38 L Plaza (539_CR49) 2011; 53 539_CR2 539_CR3 539_CR29 539_CR5 539_CR6 539_CR28 539_CR8 539_CR9 539_CR20 539_CR25 AR Aronson (539_CR34) 2010; 17 539_CR26 539_CR23 539_CR24 539_CR39 A Jimeno-Yepes (539_CR56) 2012; 6 L Smith (539_CR41) 2004; 20 539_CR32 539_CR33 539_CR30 539_CR36 539_CR37 539_CR35 JR Herskovica (539_CR27) 2011; 80 F Sebastiani (539_CR1) 2002; 34 9929332 - Proc AMIA Symp. 1998;:815-9 17971238 - BMC Bioinformatics. 2007;8:423 15073016 - Bioinformatics. 2004 Sep 22;20(14):2320-1 16815739 - Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 22195224 - AMIA Annu Symp Proc. 2011;2011:1583-92 16779160 - AMIA Annu Symp Proc. 2005;:849-53 23725347 - BMC Bioinformatics. 2013;14:171 8563418 - Proc Annu Symp Comput Appl Med Care. 1995;:878-82 19025687 - BMC Bioinformatics. 2008;9 Suppl 11:S11 15360816 - Stud Health Technol Inform. 2004;107(Pt 1):268-72 23537461 - BMC Bioinformatics. 2013;14:113 16165395 - Int J Med Inform. 2006 Jun;75(6):488-95 23445074 - BMC Bioinformatics. 2013;14:71 16779043 - AMIA Annu Symp Proc. 2005;:266-70 20442139 - J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36 11825149 - Proc AMIA Symp. 2001;:17-21 21752612 - Artif Intell Med. 2011 Sep;53(1):1-14 21439897 - Int J Med Inform. 2011 Jun;80(6):431-41 18436913 - J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53 24551371 - AMIA Annu Symp Proc. 2013;2013:709-18 21464855 - J Med Libr Assoc. 2011 Apr;99(2):160-3 |
| References_xml | – ident: 539_CR51 doi: 10.1007/BFb0026683 – volume: 20 start-page: 2320 issue: 14 year: 2004 ident: 539_CR41 publication-title: Bioinformatics (Oxford, England) doi: 10.1093/bioinformatics/bth227 – volume: 32 start-page: D267 issue: suppl 1 year: 2004 ident: 539_CR4 publication-title: Nucleic Acids Res doi: 10.1093/nar/gkh061 – ident: 539_CR9 doi: 10.1186/1471-2105-14-171 – ident: 539_CR29 – ident: 539_CR6 doi: 10.3115/1273073.1273160 – ident: 539_CR3 – ident: 539_CR40 – ident: 539_CR25 – ident: 539_CR2 doi: 10.1145/2110363.2110450 – ident: 539_CR11 – ident: 539_CR30 – volume: 34 start-page: 1 year: 2002 ident: 539_CR1 publication-title: ACM Comput Surveys (CSUR) doi: 10.1145/505282.505283 – volume-title: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’92 year: 1992 ident: 539_CR17 – ident: 539_CR45 doi: 10.6028/NIST.SP.500-255.genomics-u.hospitalgeneva – ident: 539_CR57 – ident: 539_CR19 – ident: 539_CR46 doi: 10.1016/j.ijmedinf.2005.06.007 – volume: 12 start-page: 233 year: 1994 ident: 539_CR12 publication-title: ACM Trans Inf Syst doi: 10.1145/183422.183423 – ident: 539_CR18 doi: 10.3115/1118693.1118704 – ident: 539_CR20 – ident: 539_CR28 – volume: 32 start-page: 124 year: 1995 ident: 539_CR31 publication-title: Advances in inductive logic programming – ident: 539_CR48 doi: 10.3163/1536-5050.99.2.009 – ident: 539_CR10 – ident: 539_CR26 doi: 10.1186/1471-2105-14-113 – ident: 539_CR24 doi: 10.1007/978-1-4471-2099-5_20 – volume: 11 start-page: 268 issue: Pt 1 year: 2004 ident: 539_CR21 publication-title: Medinfo – ident: 539_CR14 – ident: 539_CR39 – ident: 539_CR35 – ident: 539_CR52 – ident: 539_CR23 – volume: 38 start-page: 529 issue: 4 year: 2002 ident: 539_CR38 publication-title: Inf Process Manage doi: 10.1016/S0306-4573(01)00045-0 – volume: 15 start-page: 546 issue: 4 year: 2008 ident: 539_CR22 publication-title: J Am Med Informatics Assoc doi: 10.1197/jamia.M2431 – volume: 17 start-page: 229 issue: 3 year: 2010 ident: 539_CR34 publication-title: J Am Med Informatics Assoc doi: 10.1136/jamia.2009.002733 – ident: 539_CR42 – ident: 539_CR5 – ident: 539_CR8 – volume: 6 start-page: 151 issue: 2 year: 2012 ident: 539_CR56 publication-title: JCSE doi: 10.5626/JCSE.2012.6.2.151 – ident: 539_CR36 doi: 10.1186/1471-2105-8-423 – ident: 539_CR55 – ident: 539_CR13 doi: 10.1145/288627.288651 – volume: 80 start-page: 431 issue: 6 year: 2011 ident: 539_CR27 publication-title: Int J Med Informatics doi: 10.1016/j.ijmedinf.2011.02.008 – volume: 76 start-page: 195 issue: 2 year: 2007 ident: 539_CR7 publication-title: Int J Med Informatics doi: 10.1016/j.ijmedinf.2006.05.002 – volume: 53 start-page: 1 year: 2011 ident: 539_CR49 publication-title: Artif Intelligence Med doi: 10.1016/j.artmed.2011.06.005 – ident: 539_CR15 doi: 10.1145/215206.215365 – ident: 539_CR32 doi: 10.1186/1471-2105-9-S11-S11 – ident: 539_CR47 doi: 10.1186/1471-2105-14-71 – ident: 539_CR43 – ident: 539_CR53 doi: 10.1145/1102351.1102399 – ident: 539_CR58 – ident: 539_CR54 – ident: 539_CR44 doi: 10.1109/ITA.2013.114 – ident: 539_CR33 – ident: 539_CR16 – ident: 539_CR37 – ident: 539_CR50 – reference: 9929332 - Proc AMIA Symp. 1998;:815-9 – reference: 15073016 - Bioinformatics. 2004 Sep 22;20(14):2320-1 – reference: 21439897 - Int J Med Inform. 2011 Jun;80(6):431-41 – reference: 8563418 - Proc Annu Symp Comput Appl Med Care. 1995;:878-82 – reference: 16815739 - Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200 – reference: 11825149 - Proc AMIA Symp. 2001;:17-21 – reference: 23725347 - BMC Bioinformatics. 2013;14:171 – reference: 21752612 - Artif Intell Med. 2011 Sep;53(1):1-14 – reference: 21464855 - J Med Libr Assoc. 2011 Apr;99(2):160-3 – reference: 16779160 - AMIA Annu Symp Proc. 2005;:849-53 – reference: 24551371 - AMIA Annu Symp Proc. 2013;2013:709-18 – reference: 18436913 - J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53 – reference: 17971238 - BMC Bioinformatics. 2007;8:423 – reference: 15360816 - Stud Health Technol Inform. 2004;107(Pt 1):268-72 – reference: 16165395 - Int J Med Inform. 2006 Jun;75(6):488-95 – reference: 23537461 - BMC Bioinformatics. 2013;14:113 – reference: 19025687 - BMC Bioinformatics. 2008;9 Suppl 11:S11 – reference: 23445074 - BMC Bioinformatics. 2013;14:71 – reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 – reference: 22195224 - AMIA Annu Symp Proc. 2011;2011:1583-92 – reference: 20442139 - J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36 – reference: 16779043 - AMIA Annu Symp Proc. 2005;:266-70 |
| SSID | ssj0017805 |
| Score | 2.2485871 |
| Snippet | Background
Research in biomedical text categorization has mostly used the
bag-of-words
representation. Other more sophisticated representations of text based... Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on... Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or no improvement is obtained when using... Background Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based... BACKGROUNDResearch in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on... |
| SourceID | pubmedcentral proquest gale pubmed crossref springer |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 113 |
| SubjectTerms | Abstracting and Indexing as Topic - methods Algorithms Analysis Artificial Intelligence Bioinformatics Biomedical and Life Sciences Comparative analysis Computational Biology/Bioinformatics Computer Appl. in Life Sciences Data mining Humans Information Storage and Retrieval Knowledge-based analysis Life Sciences Medical Subject Headings MEDLINE Microarrays Research Article Semantics |
| Title | Feature engineering for MEDLINE citation categorization with MeSH |
| URI | https://link.springer.com/article/10.1186/s12859-015-0539-7 https://www.ncbi.nlm.nih.gov/pubmed/25887792 https://www.proquest.com/docview/1675869858 https://pubmed.ncbi.nlm.nih.gov/PMC4407321 |
| Volume | 16 |
| WOSCitedRecordID | wos000353260300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVADU databaseName: BioMed Central Open Access Free customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RBZ dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.biomedcentral.com/search/ providerName: BioMedCentral – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: DOA dateStart: 20000101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (selected full-text only) customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M~E dateStart: 20000101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: P5Z dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: Biological Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: M7P dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/biologicalscijournals providerName: ProQuest – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: K7- dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: 7X7 dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: BENPR dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Publicly Available Content Database customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: PIMPY dateStart: 20090101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1471-2105 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017805 issn: 1471-2105 databaseCode: RSV dateStart: 20001201 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3di9QwEB-8OwVf_P6onqWKICjBNm2a9HHVPW7RXcquynovoUnT80C6st0V_O-dpO1qFxX0JRQybdPpJDOTmfwG4CljTKkoSomJipIkjHOiklKRjCkdm0oxnSlXbILPZmK5zPLuHHfTZ7v3IUm3UrtpLdKXTWSx1tD1ZQQFJyP8AI5Q2wlbr2G--LgLHViQ_i58-dvbBgpofxn-RQ_t50juBUqd_jm5_l8jvwHXOnMzGLXycRMumfoWXGkLUH6_DSNrAW7XJjA_YQkDNGOD6fjNu8lsHOgOwjuwiVPnq3V3ajOw27fB1CxO78CHk_H716ekK6pANGruDaloYlgRZyZimqYV8kuVBU0qNEsUV6mOs8rwsgxjvEo0au_U4sPEOrTIdKKq4rtwWK9qcx8Clmqq44JnGl1bI3gR0qJAC0wZgY8wsQdhz2nZD9cWvvginechUtlyRiJnpOWM5B48393ytYXb-BvxE_v7pIWxqG2ezHmxbRo5WczliKFyZuiLMQ-edUTVCl-ui-7YAX6CRb4aUB4PKHGe6UH3415KpO2yyWm1WW0bGVmnK80EEx7ca6VmN3jKcBXnGfWAD-RpR2DhvYc99cVnB_OdoK8d08iDF71UyW59af7Mkwf_RP0QrlInlgkJxTEcbtZb8wgu62-bi2btwwFfctcKH45ejWf53Hd7Fti-5cS3ebI5tjk7w_58Ms0_-W4u_gALkSYp |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3di9QwEB_0VPTFz1Orp1YRBCVcmzRN-rjoHnt4u8jtKfcWkjQ9D6Qr292D---dtOlqFxX0rZBJm04n89GZ_AbgNefcmDTNiUt1STIuBDFZaUjBjWWuMtwWpm02IWYzeXpafArnuJu-2r1PSbaaut3WMt9vUo-1hqEvJyg4BRFX4VqGBssD5h_Pv2xSBx6kP6QvfzttYIC21fAvdmi7RnIrUdran4M7_7Xyu3A7uJvxqJOPe3DF1ffhRteA8vIBjLwHuF662P2EJYzRjY2n4w9Hh7NxbAOEd-wLp84Wy3BqM_a_b-Opm0924fPB-OT9hISmCsSi5V6RimaOa1a4lFuaV8gvU2qaVeiWGGFyy4rKibJMGF5lFq137vFhmE08Mp2sKvYQdupF7R5DzHNLLdOisBjaOil0QrVGD8w4ibdwLIKk57Tql-sbX3xTbeQhc9VxRiFnlOeMEhG83Uz53sFt_I34lf98ysNY1L5O5kyvm0Ydzo_ViKNx5hiL8QjeBKJqgQ-3Ohw7wFfwyFcDyr0BJe4zOxh-2UuJ8kO-OK12i3WjUh905YXkMoJHndRsFk85anFR0AjEQJ42BB7eezhSn39tYb4zjLUZTSN410uVCvql-TNPnvwT9Qu4OTmZHimUqI9P4RZtRTQjidyDndVy7Z7BdXuxOm-Wz9s99gOlsSA4 |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Zb9QwEB5BOcQL9xFawCAkpCKrSRzHyeOK7qoVdFWxgPpm2Y5TKqFstdlF6r_vTI6lWQES4i2Sx0k8-eyZicffALyVUlobRSn3kSl4IpXiNiksz6V1wpdWutw2xSbUdJqdnOTHXZ3Tus9277ck2zMNxNJULffOi7Kd4lm6V0fEu4ZhsOQIopyr63AjoTx6Ctdn39bbCETY321l_rbbwBhtLslXbNJmvuTGpmljiyb3_nsU9-Fu54ayUYubB3DNVw_hVluY8uIRjMgzXC0887_oChm6t-xovP_pcDpmrqP2ZpRQdTpfdKc5Gf3WZUd-dvAYvk7GXz4c8K7YAndo0Ze8jBMvjch9JF2clqg7W5g4KdFdscqmTuSlV0URCrxKHFr1lHhjhAuJsS4rS_EEtqp55Z8Bk6mLnTAqdxjy-kyZMDYGPTPrM7yFFwGEvdZ1_7pUEOOHbiKSLNWtZjRqRpNmtApgd93lvKXh-JvwG_qUmugtKsqfOTWrutaHs896JNFoS4zRZADvOqFyjg93pjuOgEMgRqyB5M5AEuefGzS_7hGjqYmS1io_X9U6omAszTOZBfC0RdD65WOJq7vK4wDUAFtrAaL9HrZUZ98b-u8EY3ARRwG87xGmu3Wn_rNOnv-T9Cu4fbw_0Qioj9twJ24QmvAw24Gt5WLlX8BN93N5Vi9eNtPtErCYKRw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Feature+engineering+for+MEDLINE+citation+categorization+with+MeSH&rft.jtitle=BMC+bioinformatics&rft.au=Jimeno+Yepes%2C+Antonio+Jose&rft.au=Plaza%2C+Laura&rft.au=Carrillo-de-Albornoz%2C+Jorge&rft.au=Mork%2C+James+G&rft.date=2015-04-08&rft.pub=BioMed+Central&rft.eissn=1471-2105&rft.volume=16&rft_id=info:doi/10.1186%2Fs12859-015-0539-7&rft_id=info%3Apmid%2F25887792&rft.externalDocID=PMC4407321 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon |