Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease
BackgroundDespite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Thro...
Gespeichert in:
| Veröffentlicht in: | BMJ health & care informatics Jg. 31; H. 1; S. e100963 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
BMJ Publishing Group Ltd
29.07.2024
BMJ Publishing Group LTD BMJ Publishing Group |
| Schlagworte: | |
| ISSN: | 2632-1009, 2632-1009 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | BackgroundDespite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.MethodsObservational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.FindingsFour age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.ConclusionUnsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 ObjectType-Undefined-3 Additional supplemental material is published online only. To view, please visit the journal online (https://doi.org/10.1136/bmjhci-2023-100963). Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise. None declared. |
| ISSN: | 2632-1009 2632-1009 |
| DOI: | 10.1136/bmjhci-2023-100963 |