Identifying Symptom Information in Clinical Notes Using Natural Language Processing
Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods design...
Uložené v:
| Vydané v: | Nursing research (New York) Ročník 70; číslo 3; s. 173 |
|---|---|
| Hlavní autori: | , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
01.05.2021
|
| Predmet: | |
| ISSN: | 1538-9847, 1538-9847 |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.
We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.
First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.
Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.
Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research. |
|---|---|
| AbstractList | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.
We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.
First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.
Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.
Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research. Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.BACKGROUNDSymptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.OBJECTIVESWe aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.METHODSFirst, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.RESULTSCompared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.DISCUSSIONUsing our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research. |
| Author | Bakken, Suzanne George, Maureen Koleck, Theresa A Mitha, Shazia Miaskowski, Christine Henderson, Morgan M Topaz, Maxim Tatonetti, Nicholas P Smaldone, Arlene |
| Author_xml | – sequence: 1 givenname: Theresa A surname: Koleck fullname: Koleck, Theresa A – sequence: 2 givenname: Nicholas P surname: Tatonetti fullname: Tatonetti, Nicholas P – sequence: 3 givenname: Suzanne surname: Bakken fullname: Bakken, Suzanne – sequence: 4 givenname: Shazia surname: Mitha fullname: Mitha, Shazia – sequence: 5 givenname: Morgan M surname: Henderson fullname: Henderson, Morgan M – sequence: 6 givenname: Maureen surname: George fullname: George, Maureen – sequence: 7 givenname: Christine surname: Miaskowski fullname: Miaskowski, Christine – sequence: 8 givenname: Arlene surname: Smaldone fullname: Smaldone, Arlene – sequence: 9 givenname: Maxim surname: Topaz fullname: Topaz, Maxim |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33196504$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkElPwzAUhC0Eogv8A4R85JLiLV6OqGKpVAVE6TlyvVRGiV3i5NB_TyuK1LnM08ynd5gJuIwpOgDuMJphpMRjVX3O0LmYlBdgjEsqCyWZuDy7R2CS8_eR4YRegxGlWPESsTFYLayLffD7ELdwtW93fWrhIvrUtboPKcIQ4bwJMRjdwCr1LsN1PrKV7ofukC113A566-BHl4zLx-4GXHndZHd78ilYvzx_zd-K5fvrYv60LAyTmBaKlMpST6zbEMG9dtYpbAXxlGirpLFKWSt4KTViGCGlDFbIW8Gs9loYT6bg4e_vrks_g8t93YZsXNPo6NKQa8I4poojLg_o_QkdNq2z9a4Lre729f8Q5BdEz2M_ |
| CitedBy_id | crossref_primary_10_1016_j_ijdrr_2024_104951 crossref_primary_10_1016_j_outlook_2022_04_004 crossref_primary_10_1093_eurjcn_zvad068 crossref_primary_10_1177_10998004221121109 crossref_primary_10_1038_s41598_024_51615_5 crossref_primary_10_1097_NCC_0000000000001287 crossref_primary_10_1177_10775595231194599 crossref_primary_10_1002_cam4_7253 crossref_primary_10_1093_jamiaopen_ooae082 crossref_primary_10_1016_j_ijnurstu_2021_104153 crossref_primary_10_1016_j_identj_2024_06_015 crossref_primary_10_2196_32903 crossref_primary_10_1097_CIN_0000000000000967 crossref_primary_10_34067_KID_0000000694 crossref_primary_10_1111_jnu_13038 crossref_primary_10_7759_cureus_65792 crossref_primary_10_1016_j_soncn_2023_151428 crossref_primary_10_1016_j_ienj_2023_101272 crossref_primary_10_1093_jamia_ocad079 crossref_primary_10_3390_nursrep15060218 crossref_primary_10_1002_nur_22190 crossref_primary_10_1177_10547738241292657 crossref_primary_10_1016_j_jamda_2023_09_006 crossref_primary_10_1097_ANS_0000000000000423 crossref_primary_10_1371_journal_pone_0329963 crossref_primary_10_7586_jkbns_25_035 crossref_primary_10_1097_NNR_0000000000000586 crossref_primary_10_1016_j_ijmedinf_2024_105544 crossref_primary_10_1136_bmjoq_2023_002295 crossref_primary_10_1038_s41598_024_56324_7 crossref_primary_10_1002_eng2_70365 crossref_primary_10_1038_s41746_024_01121_9 |
| ContentType | Journal Article |
| Copyright | Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
| Copyright_xml | – notice: Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1097/NNR.0000000000000488 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Nursing Social Sciences (General) |
| EISSN | 1538-9847 |
| ExternalDocumentID | 33196504 |
| Genre | Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R35 GM131905 – fundername: NINR NIH HHS grantid: K99 NR017651 – fundername: NINR NIH HHS grantid: P30 NR016587 – fundername: NINR NIH HHS grantid: R00 NR017651 |
| GroupedDBID | --- -ET .3C .GJ .Z2 0-6 07C 123 186 2FS 2KS 3T~ 41~ 4Q1 4Q2 4Q3 53G 5RE 5VS 85S 8L- 9V3 AAAAV AAHPQ AAIKC AAIQE AAJYS AAMNW AAMTA AAQQT AARTV AASCR AAUEB AAWTL AAYEP AAYJJ ABASU ABBUW ABDIG ABILE ABIVO ABJNI ABNJN ABOCM ABPPZ ABPXF ABVCZ ABWJO ABXVJ ABZAD ABZZY ACAAF ACDDN ACDOF ACEWG ACEWU ACGFO ACGFS ACHQT ACIFK ACILI ACJBD ACNCT ACNWC ACTAD ACTHT ACWDW ACWRI ACXJB ACXNZ ADBIZ ADEGP ADFPA ADGGA ADGHP ADHPY ADMHC ADNKB ADRCX ADTGS ADUKH AE3 AEETU AENEX AEQHQ AFBFQ AFDTB AFFNX AFMBP AFMFG AFPHX AFSOK AFUWQ AFYGQ AGBRE AGNAY AHQNM AHRYX AHVBC AHWXW AIDAL AIDBO AINUH AJCLO AJEOO AJIOK AJNWD AJNYG AJZMW AKCTQ ALBXT ALKUP ALMA_UNASSIGNED_HOLDINGS ALMTX AMJPA AMKUR AMNEI AOHHW AOQMC ATPOU BQLVK BQ~ BS7 BYPQX C45 CGR CS3 CUY CVF DIWNM DU5 DUNZO E.X EBS ECM EEVPB EIF EJD EX3 F2K F2L F5P FCALG FL- GH5 GNXGY GQDEL H0~ HLJTE HYJ HZ~ H~9 IKREB IN~ IYOWL J5H JF9 JG8 JK3 JK8 K8S KD2 KMI KOO L-C L47 L7B LK2 MMDCI MPPUT MZP N4W NEJ NHB NPM N~6 N~M O9- OAG OAH OBZCC OCUKA ODA OEN OFFRU OGKNY OHCKH OHT OKBHI OL1 OLG OLL OLV OLZ OMK ON2 ONSOO ONV OPUJH OPX ORAPC OROCO ORVUJ OUGNH OUVQU OUVZD OVD OVDLW OVDNE OVOZU OWU OWV OWW OWX OWY OWZ OXXIT P-K P2P PQQKQ QMB QS- QZG R58 R77 RLZ S4R S4S T8P TEORI TSPGW UKR UMD V2I VVN W3M WAC WG1 WH7 WOQ WOW X3V X3W X7L XXN XYM XZL YFH YHZ YOC YOJ YQI YQJ YR5 YSQ YXB YYQ YZZ ZCG ZFV ZGI ZT4 ZUP ZXP ZZMQN ~G0 7X8 ABUFD ACBKD ADKSD AECEA |
| ID | FETCH-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 38 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=00006199-202105000-00003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1538-9847 |
| IngestDate | Sun Nov 09 12:54:47 EST 2025 Mon Jul 21 05:34:05 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/9109773 |
| PMID | 33196504 |
| PQID | 2461396068 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2461396068 pubmed_primary_33196504 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-05-01 |
| PublicationDateYYYYMMDD | 2021-05-01 |
| PublicationDate_xml | – month: 05 year: 2021 text: 2021-05-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Nursing research (New York) |
| PublicationTitleAlternate | Nurs Res |
| PublicationYear | 2021 |
| SSID | ssj0004623 |
| Score | 2.469641 |
| Snippet | Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 173 |
| SubjectTerms | Constipation - diagnosis Depression - diagnosis Electronic Health Records - statistics & numerical data Fatigue - diagnosis Humans Natural Language Processing Pattern Recognition, Automated - methods Sleep Wake Disorders - diagnosis Symptom Assessment - nursing Tachycardia - diagnosis Vocabulary, Controlled |
| Title | Identifying Symptom Information in Clinical Notes Using Natural Language Processing |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/33196504 https://www.proquest.com/docview/2461396068 |
| Volume | 70 |
| WOSCitedRecordID | wos00006199-202105000-00003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFH6oo-DFZdzGjQge9FBsm7RpTiLi4GEsg6Mwt5LJAgNOO9pR8N-bpCl68CDYQy8hUPKWfnnL9wDOlWQSY64DYaQdkJSTIFPGriacE4NmQ50qN7VkQPM8G4_Z0Afcal9W2fpE56hlJWyM_MrynmELt7Pr-Wtgp0bZ7KofobEMHWxWrVbT8Q-28NSNd2uM2rjhtnWO0as8f2yoC9uH-Nkrv4JM97Ppb_73M7dgw8NMdNPoxTYsqbILaz400IVe05WLvGXX6MLTT1_uwKjp3XX9T2j0OZsvqhnyXUtWimhaIs8m-oLyykBV5OoOUM4dhwca-BAo8k0IZm0Xnvt3T7f3gR-9EAiSRThg5lYksY6luXjTVHMlFYskjTWOuWSZkIxJSQ164SGJLMoUBlhoSYnkmlOh4z1YKatSHQBiEaN6koiQZzFJbNI5plInkgtGNKVpD87akyyMatt8BS9V9V4X32fZg_1GHMW84eAosHUdSUgO_7D7CNZjW4niyhSPoaONYasTWBUfi2n9dup0xrzz4cMXZbDLDQ |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+Symptom+Information+in+Clinical+Notes+Using+Natural+Language+Processing&rft.jtitle=Nursing+research+%28New+York%29&rft.au=Koleck%2C+Theresa+A&rft.au=Tatonetti%2C+Nicholas+P&rft.au=Bakken%2C+Suzanne&rft.au=Mitha%2C+Shazia&rft.date=2021-05-01&rft.eissn=1538-9847&rft.volume=70&rft.issue=3&rft.spage=173&rft_id=info:doi/10.1097%2FNNR.0000000000000488&rft_id=info%3Apmid%2F33196504&rft_id=info%3Apmid%2F33196504&rft.externalDocID=33196504 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1538-9847&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1538-9847&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1538-9847&client=summon |