A post-processing algorithm for building longitudinal medication dose data from extracted medication information using natural language processing from electronic health records

Objective: We developed a post-processing algorithm to convert raw natural language processing (NLP) output from electronic health records (EHRs) into a usable format for analysis. This algorithm was specifically developed for creating datasets for use in medication-based studies. Materials and Meth...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	bioRxiv
Hlavní autori:	Mcneer, Elizabeth, Beck, Cole, Weeks, Hannah L, Williams, Michael L, James, Nathan T, Choi, Leena
Médium:	Paper
Jazyk:	English
Vydavateľské údaje:	Cold Spring Harbor Cold Spring Harbor Laboratory Press 03.02.2020 Cold Spring Harbor Laboratory
Vydanie:	1.3
Predmet:	Algorithms Bioinformatics Electronic health records Electronic medical records Information processing Lamotrigine Language Natural language processing Tacrolimus post-processing algorithm real world data natural language processing electronic health record medication extraction
ISSN:	2692-8205, 2692-8205
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Objective: We developed a post-processing algorithm to convert raw natural language processing (NLP) output from electronic health records (EHRs) into a usable format for analysis. This algorithm was specifically developed for creating datasets for use in medication-based studies. Materials and Methods: The algorithm was developed using output from two NLP systems, MedXN and medExtractR. We extracted medication information from deidentified clinical notes from Vanderbilt's EHR system for two medications, tacrolimus and lamotrigine. The algorithm consists of two parts. Part I parses the raw NLP output and connects entities together. Part II removes redundancies and calculates dose intake and daily dose. We evaluated each part by comparing to human-determined gold standards that were generated using approximately 300 records from 10 subjects for each medication and each NLP system. Results: The algorithm performed well. For MedXN, the F-measures were at or above 0.99 for Part I and at or above 0.97 for Part II. For medExtractR, the F-measures for Part I were 1.00 and for Part II they were at or above 0.98. Discussion: Our post-processing algorithm was developed separately from an NLP system, making it easier to modify and generalize to other systems. It performed well to convert NLP output to analyzable data, but it cannot perform well in certain cases, such as when incorrect information is extracted by the NLP system. Conclusion: Our post-processing algorithm provides a way to convert raw NLP output to a form that is useful for medication-based studies, leading to more opportunities to use EHR data for diverse studies.
Bibliografia:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50
ISSN:	2692-8205 2692-8205
DOI:	10.1101/775015