Identifying Symptom Information in Clinical Notes Using Natural Language Processing

Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods design...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Nursing research (New York) Ročník 70; číslo 3; s. 173
Hlavní autori: Koleck, Theresa A, Tatonetti, Nicholas P, Bakken, Suzanne, Mitha, Shazia, Henderson, Morgan M, George, Maureen, Miaskowski, Christine, Smaldone, Arlene, Topaz, Maxim
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 01.05.2021
Predmet:
ISSN:1538-9847, 1538-9847
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes. We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations. First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types. Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent. Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
AbstractList Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes. We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations. First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types. Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent. Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.BACKGROUNDSymptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.OBJECTIVESWe aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.METHODSFirst, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.RESULTSCompared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.DISCUSSIONUsing our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.
Author Bakken, Suzanne
George, Maureen
Koleck, Theresa A
Mitha, Shazia
Miaskowski, Christine
Henderson, Morgan M
Topaz, Maxim
Tatonetti, Nicholas P
Smaldone, Arlene
Author_xml – sequence: 1
  givenname: Theresa A
  surname: Koleck
  fullname: Koleck, Theresa A
– sequence: 2
  givenname: Nicholas P
  surname: Tatonetti
  fullname: Tatonetti, Nicholas P
– sequence: 3
  givenname: Suzanne
  surname: Bakken
  fullname: Bakken, Suzanne
– sequence: 4
  givenname: Shazia
  surname: Mitha
  fullname: Mitha, Shazia
– sequence: 5
  givenname: Morgan M
  surname: Henderson
  fullname: Henderson, Morgan M
– sequence: 6
  givenname: Maureen
  surname: George
  fullname: George, Maureen
– sequence: 7
  givenname: Christine
  surname: Miaskowski
  fullname: Miaskowski, Christine
– sequence: 8
  givenname: Arlene
  surname: Smaldone
  fullname: Smaldone, Arlene
– sequence: 9
  givenname: Maxim
  surname: Topaz
  fullname: Topaz, Maxim
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33196504$$D View this record in MEDLINE/PubMed
BookMark eNpNkElPwzAUhC0Eogv8A4R85JLiLV6OqGKpVAVE6TlyvVRGiV3i5NB_TyuK1LnM08ynd5gJuIwpOgDuMJphpMRjVX3O0LmYlBdgjEsqCyWZuDy7R2CS8_eR4YRegxGlWPESsTFYLayLffD7ELdwtW93fWrhIvrUtboPKcIQ4bwJMRjdwCr1LsN1PrKV7ofukC113A566-BHl4zLx-4GXHndZHd78ilYvzx_zd-K5fvrYv60LAyTmBaKlMpST6zbEMG9dtYpbAXxlGirpLFKWSt4KTViGCGlDFbIW8Gs9loYT6bg4e_vrks_g8t93YZsXNPo6NKQa8I4poojLg_o_QkdNq2z9a4Lre729f8Q5BdEz2M_
CitedBy_id crossref_primary_10_1016_j_ijdrr_2024_104951
crossref_primary_10_1016_j_outlook_2022_04_004
crossref_primary_10_1093_eurjcn_zvad068
crossref_primary_10_1177_10998004221121109
crossref_primary_10_1038_s41598_024_51615_5
crossref_primary_10_1097_NCC_0000000000001287
crossref_primary_10_1177_10775595231194599
crossref_primary_10_1002_cam4_7253
crossref_primary_10_1093_jamiaopen_ooae082
crossref_primary_10_1016_j_ijnurstu_2021_104153
crossref_primary_10_1016_j_identj_2024_06_015
crossref_primary_10_2196_32903
crossref_primary_10_1097_CIN_0000000000000967
crossref_primary_10_34067_KID_0000000694
crossref_primary_10_1111_jnu_13038
crossref_primary_10_7759_cureus_65792
crossref_primary_10_1016_j_soncn_2023_151428
crossref_primary_10_1016_j_ienj_2023_101272
crossref_primary_10_1093_jamia_ocad079
crossref_primary_10_3390_nursrep15060218
crossref_primary_10_1002_nur_22190
crossref_primary_10_1177_10547738241292657
crossref_primary_10_1016_j_jamda_2023_09_006
crossref_primary_10_1097_ANS_0000000000000423
crossref_primary_10_1371_journal_pone_0329963
crossref_primary_10_7586_jkbns_25_035
crossref_primary_10_1097_NNR_0000000000000586
crossref_primary_10_1016_j_ijmedinf_2024_105544
crossref_primary_10_1136_bmjoq_2023_002295
crossref_primary_10_1038_s41598_024_56324_7
crossref_primary_10_1002_eng2_70365
crossref_primary_10_1038_s41746_024_01121_9
ContentType Journal Article
Copyright Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
Copyright_xml – notice: Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1097/NNR.0000000000000488
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Nursing
Social Sciences (General)
EISSN 1538-9847
ExternalDocumentID 33196504
Genre Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R35 GM131905
– fundername: NINR NIH HHS
  grantid: K99 NR017651
– fundername: NINR NIH HHS
  grantid: P30 NR016587
– fundername: NINR NIH HHS
  grantid: R00 NR017651
GroupedDBID ---
-ET
.3C
.GJ
.Z2
0-6
07C
123
186
2FS
2KS
3T~
41~
4Q1
4Q2
4Q3
53G
5RE
5VS
85S
8L-
9V3
AAAAV
AAHPQ
AAIKC
AAIQE
AAJYS
AAMNW
AAMTA
AAQQT
AARTV
AASCR
AAUEB
AAWTL
AAYEP
AAYJJ
ABASU
ABBUW
ABDIG
ABILE
ABIVO
ABJNI
ABNJN
ABOCM
ABPPZ
ABPXF
ABVCZ
ABWJO
ABXVJ
ABZAD
ABZZY
ACAAF
ACDDN
ACDOF
ACEWG
ACEWU
ACGFO
ACGFS
ACHQT
ACIFK
ACILI
ACJBD
ACNCT
ACNWC
ACTAD
ACTHT
ACWDW
ACWRI
ACXJB
ACXNZ
ADBIZ
ADEGP
ADFPA
ADGGA
ADGHP
ADHPY
ADMHC
ADNKB
ADRCX
ADTGS
ADUKH
AE3
AEETU
AENEX
AEQHQ
AFBFQ
AFDTB
AFFNX
AFMBP
AFMFG
AFPHX
AFSOK
AFUWQ
AFYGQ
AGBRE
AGNAY
AHQNM
AHRYX
AHVBC
AHWXW
AIDAL
AIDBO
AINUH
AJCLO
AJEOO
AJIOK
AJNWD
AJNYG
AJZMW
AKCTQ
ALBXT
ALKUP
ALMA_UNASSIGNED_HOLDINGS
ALMTX
AMJPA
AMKUR
AMNEI
AOHHW
AOQMC
ATPOU
BQLVK
BQ~
BS7
BYPQX
C45
CGR
CS3
CUY
CVF
DIWNM
DU5
DUNZO
E.X
EBS
ECM
EEVPB
EIF
EJD
EX3
F2K
F2L
F5P
FCALG
FL-
GH5
GNXGY
GQDEL
H0~
HLJTE
HYJ
HZ~
H~9
IKREB
IN~
IYOWL
J5H
JF9
JG8
JK3
JK8
K8S
KD2
KMI
KOO
L-C
L47
L7B
LK2
MMDCI
MPPUT
MZP
N4W
NEJ
NHB
NPM
N~6
N~M
O9-
OAG
OAH
OBZCC
OCUKA
ODA
OEN
OFFRU
OGKNY
OHCKH
OHT
OKBHI
OL1
OLG
OLL
OLV
OLZ
OMK
ON2
ONSOO
ONV
OPUJH
OPX
ORAPC
OROCO
ORVUJ
OUGNH
OUVQU
OUVZD
OVD
OVDLW
OVDNE
OVOZU
OWU
OWV
OWW
OWX
OWY
OWZ
OXXIT
P-K
P2P
PQQKQ
QMB
QS-
QZG
R58
R77
RLZ
S4R
S4S
T8P
TEORI
TSPGW
UKR
UMD
V2I
VVN
W3M
WAC
WG1
WH7
WOQ
WOW
X3V
X3W
X7L
XXN
XYM
XZL
YFH
YHZ
YOC
YOJ
YQI
YQJ
YR5
YSQ
YXB
YYQ
YZZ
ZCG
ZFV
ZGI
ZT4
ZUP
ZXP
ZZMQN
~G0
7X8
ABUFD
ACBKD
ADKSD
AECEA
ID FETCH-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2
IEDL.DBID 7X8
ISICitedReferencesCount 38
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=00006199-202105000-00003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1538-9847
IngestDate Sun Nov 09 12:54:47 EST 2025
Mon Jul 21 05:34:05 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4813-9259d3f2deb276faede91d72f32ad98cd99dd7658a0410099c190fd74dafa7cf2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/9109773
PMID 33196504
PQID 2461396068
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2461396068
pubmed_primary_33196504
PublicationCentury 2000
PublicationDate 2021-05-01
PublicationDateYYYYMMDD 2021-05-01
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-05-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Nursing research (New York)
PublicationTitleAlternate Nurs Res
PublicationYear 2021
SSID ssj0004623
Score 2.469641
Snippet Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 173
SubjectTerms Constipation - diagnosis
Depression - diagnosis
Electronic Health Records - statistics & numerical data
Fatigue - diagnosis
Humans
Natural Language Processing
Pattern Recognition, Automated - methods
Sleep Wake Disorders - diagnosis
Symptom Assessment - nursing
Tachycardia - diagnosis
Vocabulary, Controlled
Title Identifying Symptom Information in Clinical Notes Using Natural Language Processing
URI https://www.ncbi.nlm.nih.gov/pubmed/33196504
https://www.proquest.com/docview/2461396068
Volume 70
WOSCitedRecordID wos00006199-202105000-00003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8QwFH6oo-DFZdzGjQge9FBsm7RpTiLi4GEsg6Mwt5LJAgNOO9pR8N-bpCl68CDYQy8hUPKWfnnL9wDOlWQSY64DYaQdkJSTIFPGriacE4NmQ50qN7VkQPM8G4_Z0Afcal9W2fpE56hlJWyM_MrynmELt7Pr-Wtgp0bZ7KofobEMHWxWrVbT8Q-28NSNd2uM2rjhtnWO0as8f2yoC9uH-Nkrv4JM97Ppb_73M7dgw8NMdNPoxTYsqbILaz400IVe05WLvGXX6MLTT1_uwKjp3XX9T2j0OZsvqhnyXUtWimhaIs8m-oLyykBV5OoOUM4dhwca-BAo8k0IZm0Xnvt3T7f3gR-9EAiSRThg5lYksY6luXjTVHMlFYskjTWOuWSZkIxJSQ164SGJLMoUBlhoSYnkmlOh4z1YKatSHQBiEaN6koiQZzFJbNI5plInkgtGNKVpD87akyyMatt8BS9V9V4X32fZg_1GHMW84eAosHUdSUgO_7D7CNZjW4niyhSPoaONYasTWBUfi2n9dup0xrzz4cMXZbDLDQ
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+Symptom+Information+in+Clinical+Notes+Using+Natural+Language+Processing&rft.jtitle=Nursing+research+%28New+York%29&rft.au=Koleck%2C+Theresa+A&rft.au=Tatonetti%2C+Nicholas+P&rft.au=Bakken%2C+Suzanne&rft.au=Mitha%2C+Shazia&rft.date=2021-05-01&rft.eissn=1538-9847&rft.volume=70&rft.issue=3&rft.spage=173&rft_id=info:doi/10.1097%2FNNR.0000000000000488&rft_id=info%3Apmid%2F33196504&rft_id=info%3Apmid%2F33196504&rft.externalDocID=33196504
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1538-9847&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1538-9847&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1538-9847&client=summon