Preprocessing narrative texts in electronic medical records to identify hospital adverse events: A scoping review.
Gespeichert in:
| Titel: | Preprocessing narrative texts in electronic medical records to identify hospital adverse events: A scoping review. |
|---|---|
| Autoren: | Jafarpour H; Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: hamed.jafarpour@concordia.ca., Wu G; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: Guosong.wu@ucalgary.ca., Cheligeer CK; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: cheligeerken@ucalgary.ca., Yan J; Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: jun.yan@concordia.ca., Xu Y; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: yuxu@ucalgary.ca., Southern DA; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: dasouthe@ucalgary.ca., Eastwood CA; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: caeastwo@ucalgary.ca., Zeng Y; Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: yong.zeng@concordia.ca., Quan H; University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: hquan@ucalgary.ca. |
| Quelle: | Artificial intelligence in medicine [Artif Intell Med] 2025 Dec; Vol. 170, pp. 103281. Date of Electronic Publication: 2025 Oct 08. |
| Publikationsart: | Journal Article; Scoping Review |
| Sprache: | English |
| Info zur Zeitschrift: | Publisher: Elsevier Science Publishing Country of Publication: Netherlands NLM ID: 8915031 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1873-2860 (Electronic) Linking ISSN: 09333657 NLM ISO Abbreviation: Artif Intell Med Subsets: MEDLINE |
| Imprint Name(s): | Publication: Amsterdam : Elsevier Science Publishing Original Publication: Tecklenburg, Federal Republic of Germany : Burgverlag, c1989- |
| MeSH-Schlagworte: | Electronic Health Records* , Medical Errors* , Natural Language Processing*, Humans ; Machine Learning ; Narration |
| Abstract: | Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Hude Quan’s report received funding from the Canadian Institutes of Health Research. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Background: Narrative electronic medical records (EMR), which include textual notes created by clinicians within healthcare environments, represent a significant resource for documenting various facets of patient care. This form of text exhibits distinctive characteristics, such as the occurrence of grammatically incorrect sentences, abbreviations, frequent acronyms, specialized characters with particular meanings, negation expressions, and sporadic misspellings. As a result, a primary goal in processing these textual notes is to implement effective preprocessing techniques that enhance data quality and ensure consistency across all entries. Recent advancements in algorithms and methodologies within the fields of natural language processing (NLP), machine learning (ML), and large language models (LLM) have prompted researchers to leverage narrative EMR for the detection of hospital adverse events (HAE). Methods: The scoping review adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. A scoping review protocol was developed and utilized to guide the research process, clearly outlining the eligibility criteria, information sources, search strategies, data management, selection process, data collection procedures, data items, outcomes and prioritization, data synthesis, and meta-bias considerations. The search strategy was implemented across nine engineering and medical electronic databases. Results: The results have indicated that from a total of 3,264 studies retrieved, 48 unique studies were included in the review. Responses to the research questions were systematically extracted from these studies. The review has identified challenges associated with the preprocessing of narrative texts in EMR for HAE identification. Additionally, three research gaps have been identified: (1) the imperative need for a pipeline to preprocess narrative EMR for the identification of HAE, (2) the necessity for a robust system capable of managing the extensive volume of narrative EMR data, and (3) the requirement for temporal event system, which are essential for effective HAE detection. The study also has underscored the essential role of preprocessing tasks in enhancing the performance of HAE detection. The study has emphasized the importance of extracting N-grams from clinical text, normalizing these N-grams through lemmatization and/or stemming, and establishing semantic feature extraction in preprocessing tasks that significantly affect HAE detection performance. While LLM-based systems naturally incorporate tokenization and normalization processes within their frameworks, it remains crucial to address features that hold semantic relevance to the specific type of HAE during preprocessing. Conclusion: This scoping review has provided valuable insights for researchers focused on HAE detection utilizing narrative EMR data. It has elucidated how preprocessing tasks can elevate the performance of HAE detection and draws attention to neglected research gaps within the field. Addressing these gaps will necessitate further investigation in subsequent research endeavors. (Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.) |
| Contributed Indexing: | Keywords: Clinical text; Hospital adverse event; Large language model; Narrative EMR; Natural language processing; Preprocessing |
| Entry Date(s): | Date Created: 20251010 Date Completed: 20251030 Latest Revision: 20251120 |
| Update Code: | 20251121 |
| DOI: | 10.1016/j.artmed.2025.103281 |
| PMID: | 41072367 |
| Datenbank: | MEDLINE |
| Abstract: | Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Hude Quan’s report received funding from the Canadian Institutes of Health Research. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.<br />Background: Narrative electronic medical records (EMR), which include textual notes created by clinicians within healthcare environments, represent a significant resource for documenting various facets of patient care. This form of text exhibits distinctive characteristics, such as the occurrence of grammatically incorrect sentences, abbreviations, frequent acronyms, specialized characters with particular meanings, negation expressions, and sporadic misspellings. As a result, a primary goal in processing these textual notes is to implement effective preprocessing techniques that enhance data quality and ensure consistency across all entries. Recent advancements in algorithms and methodologies within the fields of natural language processing (NLP), machine learning (ML), and large language models (LLM) have prompted researchers to leverage narrative EMR for the detection of hospital adverse events (HAE).<br />Methods: The scoping review adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. A scoping review protocol was developed and utilized to guide the research process, clearly outlining the eligibility criteria, information sources, search strategies, data management, selection process, data collection procedures, data items, outcomes and prioritization, data synthesis, and meta-bias considerations. The search strategy was implemented across nine engineering and medical electronic databases.<br />Results: The results have indicated that from a total of 3,264 studies retrieved, 48 unique studies were included in the review. Responses to the research questions were systematically extracted from these studies. The review has identified challenges associated with the preprocessing of narrative texts in EMR for HAE identification. Additionally, three research gaps have been identified: (1) the imperative need for a pipeline to preprocess narrative EMR for the identification of HAE, (2) the necessity for a robust system capable of managing the extensive volume of narrative EMR data, and (3) the requirement for temporal event system, which are essential for effective HAE detection. The study also has underscored the essential role of preprocessing tasks in enhancing the performance of HAE detection. The study has emphasized the importance of extracting N-grams from clinical text, normalizing these N-grams through lemmatization and/or stemming, and establishing semantic feature extraction in preprocessing tasks that significantly affect HAE detection performance. While LLM-based systems naturally incorporate tokenization and normalization processes within their frameworks, it remains crucial to address features that hold semantic relevance to the specific type of HAE during preprocessing.<br />Conclusion: This scoping review has provided valuable insights for researchers focused on HAE detection utilizing narrative EMR data. It has elucidated how preprocessing tasks can elevate the performance of HAE detection and draws attention to neglected research gaps within the field. Addressing these gaps will necessitate further investigation in subsequent research endeavors.<br /> (Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.) |
|---|---|
| ISSN: | 1873-2860 |
| DOI: | 10.1016/j.artmed.2025.103281 |
Full Text Finder
Nájsť tento článok vo Web of Science