Computationally efficient multi-sample flow cytometry data analysis using Gaussian mixture models
Background An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can character...
Uložené v:
| Vydané v: | BMC bioinformatics Ročník 26; číslo 1; s. 262 - 18 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
London
BioMed Central
23.10.2025
BioMed Central Ltd Springer Nature B.V BMC |
| Predmet: | |
| ISSN: | 1471-2105, 1471-2105 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Background
An important challenge in flow cytometry (FCM) data analysis is making comparisons of corresponding cell populations across multiple FCM samples. An interesting solution is creating a statistical mixture model for multiple samples simultaneously, as such a multi-sample model can characterize a heterogeneous set of samples, and facilitates direct comparison of cell populations across the data samples. The multi-sample approach to statistical mixture modeling has been explored in a number of reports, mostly within a Bayesian framework and with high computational complexity. Although these approaches are effective, they are also computationally demanding, and therefore do not relate well to the requirement of scalability, which is essential in the multi-sample setting. This limits their utility in the analysis of large sets of large FCM samples.
Results
We show that basic Gaussian mixture models can be extended to large data sets consisting of multiple samples, using a computationally efficient implementation of the expectation-maximization algorithm. We show that the multi-sample Gaussian mixture model (MSGMM) is competitive with other models, in both rare cell detection and sample classification accuracy. This allows us to further explore the utility of MSGMMs in the analysis of heterogeneous sets of samples. We demonstrate how simple heuristics on MSGMM model output can directly reveal structural patterns in a collection of FCM samples.
Conclusions
We recover the efficiency and utility of the basic MSGMM which underlies more complex and non-parametric Bayesian hierarchical mixture models. The possibility of fitting GMMs to large sets of FCM samples provides opportunities for the discovery of associations between sample composition and sample meta-data such as treatment responses and clinical outcomes. |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1471-2105 1471-2105 |
| DOI: | 10.1186/s12859-025-06285-z |