Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles

Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and suppor...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine Ročník 2015; s. 258
Hlavní autori: Rios, Anthony, Kavuluru, Ramakanth
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 01.09.2015
Predmet:
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.
AbstractList Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.
Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research articles and clinical documents. Due to large feature spaces, traditionally, discriminative approaches such as logistic regression and support vector machines with n-gram and semantic features (e.g., named entities) have been used for text classification where additional performance gains are typically made through feature selection and ensemble approaches. In this paper, we demonstrate that a more direct approach using convolutional neural networks (CNNs) outperforms several traditional approaches in biomedical text classification with the specific use-case of assigning medical subject headings (or MeSH terms) to biomedical articles. Trained annotators at the national library of medicine (NLM) assign on an average 13 codes to each biomedical article, thus semantically indexing scientific literature to support NLM's PubMed search system. Recent evidence suggests that effective automated efforts for MeSH term assignment start with binary classifiers for each term. In this paper, we use CNNs to build binary text classifiers and achieve an absolute improvement of over 3% in macro F-score over a set of selected hard-to-classify MeSH terms when compared with the best prior results on a public dataset. Additional experiments on 50 high frequency terms in the dataset also show improvements with CNNs. Our results indicate the strong potential of CNNs in biomedical text classification tasks.
Author Rios, Anthony
Kavuluru, Ramakanth
Author_xml – sequence: 1
  givenname: Anthony
  surname: Rios
  fullname: Rios, Anthony
  organization: Department of Computer Science, University of Kentucky, Lexington, Kentucky
– sequence: 2
  givenname: Ramakanth
  surname: Kavuluru
  fullname: Kavuluru, Ramakanth
  organization: Division of Biomedical Informatics, Depts. of Biostatistics and Computer Science, University of Kentucky, Lexington, Kentucky
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28736769$$D View this record in MEDLINE/PubMed
BookMark eNpNkDtPwzAURj2AeBRmNuSRpSV2bMdhCxGPShUsZY4cP5CFYwc7gfLvCSVITEf30_muru4pOPDBawAuULZCiNBrzDNeoHK1J2FH4BjzImcFK09AVwf_Edw42OCFg096jHsMnyG-JWhChLc2dFpZOeVbvRtg7URK1kzBT-kGVn3v5gFaD9de6Z31r_97VRysdDqdgUMjXNLnMxfg5f5uWz8uN88P67raLAWm-bA0hKCWCIlLpSXLBWZ5oUrVGiwlpYQypguOKBdISpXRHHMiDWM4RyVricnwAlz97u1jeB91GprOJqmdE16HMTWonFyEMKWTejmrYzud2_TRdiJ-NX8vwt-biWSh
ContentType Journal Article
DBID NPM
7X8
DOI 10.1145/2808719.2808746
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
ExternalDocumentID 28736769
Genre Journal Article
GrantInformation_xml – fundername: NCATS NIH HHS
  grantid: UL1 TR000117
GroupedDBID NPM
7X8
ID FETCH-LOGICAL-a253t-f441b4ac29dec63a2637d9dbf2cc554566e78158a1ccd053284cf6623196b4f02
IEDL.DBID 7X8
IngestDate Fri Jul 11 15:05:34 EDT 2025
Thu Jan 02 23:01:01 EST 2025
IsPeerReviewed false
IsScholarly false
Keywords convolutional neural networks
text classification
medical subject headings
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a253t-f441b4ac29dec63a2637d9dbf2cc554566e78158a1ccd053284cf6623196b4f02
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 28736769
PQID 1923111255
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1923111255
pubmed_primary_28736769
PublicationCentury 2000
PublicationDate 20150901
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 9
  year: 2015
  text: 20150901
  day: 1
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle ACM-BCB ... ... : the ... ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACM Conference on Bioinformatics, Computational Biology and Biomedicine
PublicationTitleAlternate ACM BCB
PublicationYear 2015
Score 1.9635117
Snippet Building high accuracy text classifiers is an important task in biomedicine given the wealth of information hidden in unstructured narratives such as research...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 258
Title Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles
URI https://www.ncbi.nlm.nih.gov/pubmed/28736769
https://www.proquest.com/docview/1923111255
Volume 2015
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrDwEK_ykpFYQxPHTmIWVCoqkKDqUFC3KL7YEgNJIaW_n7PjqixISCzxEFm2fOfzd77zd4RcKW40SKECruI04EzpQPLMBJEuDFOhgMQxMb0-paNRNp3Ksb9wa3xa5dImOkNd1mDvyHsOiSA4EOJ29hHYqlE2uupLaKyTToxQxmp1Os08g0_ERY9lIXoE8tq1PPkdRrrjZLjz34nskm0PJGm_lfweWdPVPnkf1NXCKxP-tMwbrnGp3g1FgErv3Ht7Kxo6QcNMXVVMmy_kRHRD-6uINn2r6KNlU8Tj7Wc_P2ZzQF6G95PBQ-ALKgQFE_E8MIh9FC-AyVJDEhcsidNSlsowAGGhVKLTLBJZEQGUtmRExsEkCJBwm6JQQ3ZINqq60seEAgIXie5MYUTJVQoSIFMAgO5JaBg3XXK5XL0cFdZGIYpK119Nvlq_LjlqRZDPWmaNHN03yyAnT_7Q-5RsIXgRbb7XGekY3K76nGzCYv7WfF44TcDvaPz8DWy_wv0
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Convolutional+Neural+Networks+for+Biomedical+Text+Classification%3A+Application+in+Indexing+Biomedical+Articles&rft.jtitle=ACM-BCB+...+...+%3A+the+...+ACM+Conference+on+Bioinformatics%2C+Computational+Biology+and+Biomedicine.+ACM+Conference+on+Bioinformatics%2C+Computational+Biology+and+Biomedicine&rft.au=Rios%2C+Anthony&rft.au=Kavuluru%2C+Ramakanth&rft.date=2015-09-01&rft.volume=2015&rft.spage=258&rft_id=info:doi/10.1145%2F2808719.2808746&rft_id=info%3Apmid%2F28736769&rft_id=info%3Apmid%2F28736769&rft.externalDocID=28736769