Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles
Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and suppor...
Uložené v:
| Vydané v: | ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine Ročník 2015; s. 258 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
01.09.2015
|
| Predmet: | |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks. |
|---|---|
| AbstractList | Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks. Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks. |
| Author | Rios, Anthony Kavuluru, Ramakanth |
| Author_xml | – sequence: 1 givenname: Anthony surname: Rios fullname: Rios, Anthony organization: Department of Computer Science, University of Kentucky, Lexington, Kentucky – sequence: 2 givenname: Ramakanth surname: Kavuluru fullname: Kavuluru, Ramakanth organization: Division of Biomedical Informatics, Depts. of Biostatistics and Computer Science, University of Kentucky, Lexington, Kentucky |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/28736769$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkDtPwzAURj2AeBRmNuSRpSV2bMdhCxGPShUsZY4cP5CFYwc7gfLvCSVITEf30_muru4pOPDBawAuULZCiNBrzDNeoHK1J2FH4BjzImcFK09AVwf_Edw42OCFg096jHsMnyG-JWhChLc2dFpZOeVbvRtg7URK1kzBT-kGVn3v5gFaD9de6Z31r_97VRysdDqdgUMjXNLnMxfg5f5uWz8uN88P67raLAWm-bA0hKCWCIlLpSXLBWZ5oUrVGiwlpYQypguOKBdISpXRHHMiDWM4RyVricnwAlz97u1jeB91GprOJqmdE16HMTWonFyEMKWTejmrYzud2_TRdiJ-NX8vwt-biWSh |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1145/2808719.2808746 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| ExternalDocumentID | 28736769 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NCATS NIH HHS grantid: UL1 TR000117 |
| GroupedDBID | NPM 7X8 |
| ID | FETCH-LOGICAL-a253t-f441b4ac29dec63a2637d9dbf2cc554566e78158a1ccd053284cf6623196b4f02 |
| IEDL.DBID | 7X8 |
| IngestDate | Fri Jul 11 15:05:34 EDT 2025 Thu Jan 02 23:01:01 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Keywords | convolutional neural networks text classification medical subject headings |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a253t-f441b4ac29dec63a2637d9dbf2cc554566e78158a1ccd053284cf6623196b4f02 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 28736769 |
| PQID | 1923111255 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1923111255 pubmed_primary_28736769 |
| PublicationCentury | 2000 |
| PublicationDate | 20150901 |
| PublicationDateYYYYMMDD | 2015-09-01 |
| PublicationDate_xml | – month: 9 year: 2015 text: 20150901 day: 1 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine |
| PublicationTitleAlternate | ACM BCB |
| PublicationYear | 2015 |
| Score | 1.9635117 |
| Snippet | Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 258 |
| Title | Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/28736769 https://www.proquest.com/docview/1923111255 |
| Volume | 2015 |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrDwEK_ykpFYQxPHTmIWVCoqkKDqUFC3KL7YEgNJIaW_n7PjqixISCzxEFm2fOfzd77zd4RcKW40SKECruI04EzpQPLMBJEuDFOhgMQxMb0-paNRNp3Ksb9wa3xa5dImOkNd1mDvyHsOiSA4EOJ29hHYqlE2uupLaKyTToxQxmp1Os08g0_ERY9lIXoE8tq1PPkdRrrjZLjz34nskm0PJGm_lfweWdPVPnkf1NXCKxP-tMwbrnGp3g1FgErv3Ht7Kxo6QcNMXVVMmy_kRHRD-6uINn2r6KNlU8Tj7Wc_P2ZzQF6G95PBQ-ALKgQFE_E8MIh9FC-AyVJDEhcsidNSlsowAGGhVKLTLBJZEQGUtmRExsEkCJBwm6JQQ3ZINqq60seEAgIXie5MYUTJVQoSIFMAgO5JaBg3XXK5XL0cFdZGIYpK119Nvlq_LjlqRZDPWmaNHN03yyAnT_7Q-5RsIXgRbb7XGekY3K76nGzCYv7WfF44TcDvaPz8DWy_wv0 |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Convolutional+Neural+Networks+for+Biomedical+Text+Classification%3A+Application+in+Indexing+Biomedical+Articles&rft.jtitle=ACM-BCB+...+...+%3A+the+...+ACM+Conference+on+Bioinformatics%2C+Computational+Biology+and+Biomedicine.+ACM+Conference+on+Bioinformatics%2C+Computational+Biology+and+Biomedicine&rft.au=Rios%2C+Anthony&rft.au=Kavuluru%2C+Ramakanth&rft.date=2015-09-01&rft.volume=2015&rft.spage=258&rft_id=info:doi/10.1145%2F2808719.2808746&rft_id=info%3Apmid%2F28736769&rft_id=info%3Apmid%2F28736769&rft.externalDocID=28736769 |