Haematology dimension reduction, a large scale application to regular care haematology data.
Saved in:
| Title: | Haematology dimension reduction, a large scale application to regular care haematology data. |
|---|---|
| Authors: | Joosse HJ; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., Chumsaeng-Reijers C; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., Huisman A; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., Hoefer IE; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., van Solinge WW; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., Haitjema S; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands., van Es B; Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands. b.vanes-3@umcutrecht.nl. |
| Source: | BMC medical informatics and decision making [BMC Med Inform Decis Mak] 2025 Feb 12; Vol. 25 (1), pp. 75. Date of Electronic Publication: 2025 Feb 12. |
| Publication Type: | Journal Article |
| Language: | English |
| Journal Info: | Publisher: BioMed Central Country of Publication: England NLM ID: 101088682 Publication Model: Electronic Cited Medium: Internet ISSN: 1472-6947 (Electronic) Linking ISSN: 14726947 NLM ISO Abbreviation: BMC Med Inform Decis Mak Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: London : BioMed Central, [2001- |
| MeSH Terms: | Hematology*/methods , Hematology*/statistics & numerical data , Electronic Data Processing*/methods , Blood Cell Count*/statistics & numerical data , Principal Component Analysis*, Big Data ; Data Visualization ; Datasets as Topic ; Humans ; Software ; Male ; Female ; Adult ; Middle Aged ; Aged |
| Abstract: | Competing Interests: Declarations. Ethics approval and consent to participate: The institutional review board (Medical Research Ethics Comittee NedMec) waived the need for informed consent, as only pseudonymized data were used for a large patient sample. The study was in concordance with the declaration of Helsinki. This study was not subject to the Human Subjects Act (in Dutch: Wet Medisch-Wetenschappelijk onderzoek met mensen, WMO) and we therefore obtained a waiver for study approval from the institutional review board (Medical Research Ethics Comittee NedMec). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests. Background: The routine diagnostic process increasingly entails the processing of high-volume and high-dimensional data that cannot be directly visualised. This processing may provide scaling issues that limit the implementation of these types of data into research as well as integrated diagnostics in routine care. Here, we investigate whether we can use existing dimension reduction techniques to provide visualisations and analyses for a complete bloodcount (CBC) while maintaining representativeness of the original data. We considered over 3 million CBC measurements encompassing over 70 parameters of cell frequency, size and complexity from the UMC Utrecht UPOD database. We evaluated PCA as an example of a linear dimension reduction techniques and UMAP, TriMap and PaCMAP as non-linear dimension reduction techniques. We assessed their technical performance using quality metrics for dimension reduction as well as biological representation by evaluating preservation of diurnal, age and sex patterns, cluster preservation and the identification of leukemia patients. Results: We found that, for clinical hematology data, PCA performs systematically better than UMAP, TriMap and PaCMAP in representing the underlying data. Biological relevance was retained for periodicity in the data. However, we also observed a decrease in predictive performance of the reduced data for both age and sex, as well as an overestimation of clusters within the reduced data. Finally, we were able to identify the diverging patterns for leukemia patients after use of dimensionality reduction methods. Conclusions: We conclude that for hematology data, the use of unsupervised dimension reduction techniques should be limited to data visualization applications, as implementing them in diagnostic pipelines may lead to decreased quality of integrated diagnostics in routine care. (© 2025. The Author(s).) |
| References: | Clin Chem Lab Med. 2007;45(1):13-9. (PMID: 17243908) Comput Biol Chem. 2016 Dec;65:165-172. (PMID: 27687329) Commun Biol. 2022 Jul 19;5(1):719. (PMID: 35853932) Heliyon. 2021 Feb 06;7(2):e06199. (PMID: 33644472) J Immunol. 2022 Nov 15;209(10):1999-2011. (PMID: 36426946) Nat Biotechnol. 2018 Dec 03;:. (PMID: 30531897) Neural Comput. 2021 Oct 12;33(11):2881-2907. (PMID: 34474477) Scand J Clin Lab Invest. 2011 Nov;71(7):532-41. (PMID: 21988588) Cell Rep. 2021 Jul 27;36(4):109442. (PMID: 34320340) J Clin Pathol. 1989 Feb;42(2):172-9. (PMID: 2921359) BMC Bioinformatics. 2020 Oct 29;21(1):485. (PMID: 33121431) Sleep. 2012 Jul 01;35(7):933-40. (PMID: 22754039) Atheroscler Plus. 2023 Jun 01;52:32-40. (PMID: 37389152) BMC Bioinformatics. 2003 Oct 13;4:48. (PMID: 14552657) Nat Biotechnol. 2019 Dec;37(12):1482-1492. (PMID: 31796933) Stat Methods Med Res. 2007 Jun;16(3):219-42. (PMID: 17621469) Biomedicines. 2022 Mar 09;10(3):. (PMID: 35327435) Bioinformatics. 2012 Jan 1;28(1):112-8. (PMID: 22039212) Blood. 2006 Mar 1;107(5):1747-50. (PMID: 16189263) Comput Struct Biotechnol J. 2021 May 21;19:3160-3175. (PMID: 34141137) Cytometry A. 2015 Jul;87(7):636-45. (PMID: 25573116) Neural Netw. 2020 Oct;130:206-228. (PMID: 32688204) Sci Rep. 2023 Jun 7;13(1):9223. (PMID: 37286717) J Phys Chem B. 2021 May 20;125(19):5022-5034. (PMID: 33973773) BMC Emerg Med. 2022 Dec 23;22(1):208. (PMID: 36550392) Nat Med. 2021 Sep;27(9):1582-1591. (PMID: 34426707) Cancer Med. 2023 Jun;12(11):12462-12469. (PMID: 37076947) iScience. 2022 Sep 15;25(10):105142. (PMID: 36193047) J Thromb Thrombolysis. 2023 Nov;56(4):614-625. (PMID: 37596427) BMC Bioinformatics. 2012 Feb 03;13:24. (PMID: 22305354) |
| Contributed Indexing: | Keywords: Clustering; Data preservation; Dimension reduction; Haematology; Routine care data |
| Entry Date(s): | Date Created: 20250212 Date Completed: 20250710 Latest Revision: 20250710 |
| Update Code: | 20250711 |
| PubMed Central ID: | PMC11823074 |
| DOI: | 10.1186/s12911-025-02899-8 |
| PMID: | 39939843 |
| Database: | MEDLINE |
| Abstract: | Competing Interests: Declarations. Ethics approval and consent to participate: The institutional review board (Medical Research Ethics Comittee NedMec) waived the need for informed consent, as only pseudonymized data were used for a large patient sample. The study was in concordance with the declaration of Helsinki. This study was not subject to the Human Subjects Act (in Dutch: Wet Medisch-Wetenschappelijk onderzoek met mensen, WMO) and we therefore obtained a waiver for study approval from the institutional review board (Medical Research Ethics Comittee NedMec). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.<br />Background: The routine diagnostic process increasingly entails the processing of high-volume and high-dimensional data that cannot be directly visualised. This processing may provide scaling issues that limit the implementation of these types of data into research as well as integrated diagnostics in routine care. Here, we investigate whether we can use existing dimension reduction techniques to provide visualisations and analyses for a complete bloodcount (CBC) while maintaining representativeness of the original data. We considered over 3 million CBC measurements encompassing over 70 parameters of cell frequency, size and complexity from the UMC Utrecht UPOD database. We evaluated PCA as an example of a linear dimension reduction techniques and UMAP, TriMap and PaCMAP as non-linear dimension reduction techniques. We assessed their technical performance using quality metrics for dimension reduction as well as biological representation by evaluating preservation of diurnal, age and sex patterns, cluster preservation and the identification of leukemia patients.<br />Results: We found that, for clinical hematology data, PCA performs systematically better than UMAP, TriMap and PaCMAP in representing the underlying data. Biological relevance was retained for periodicity in the data. However, we also observed a decrease in predictive performance of the reduced data for both age and sex, as well as an overestimation of clusters within the reduced data. Finally, we were able to identify the diverging patterns for leukemia patients after use of dimensionality reduction methods.<br />Conclusions: We conclude that for hematology data, the use of unsupervised dimension reduction techniques should be limited to data visualization applications, as implementing them in diagnostic pipelines may lead to decreased quality of integrated diagnostics in routine care.<br /> (© 2025. The Author(s).) |
|---|---|
| ISSN: | 1472-6947 |
| DOI: | 10.1186/s12911-025-02899-8 |
Full Text Finder
Nájsť tento článok vo Web of Science