Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures

We propose a supervised learning approach to automatic quantification of cell populations in flow cytometric samples. One sample contains up to millions of measurement vectors with a dimensionality between 10 and 20. Normally, each measurement vector corresponds to a single cell in the biological sa...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Pattern recognition Ročník 60; s. 1029 - 1040
Hlavní autori: Reiter, Michael, Rota, Paolo, Kleber, Florian, Diem, Markus, Groeneveld-Krentz, Stefanie, Dworzak, Michael
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 01.12.2016
Predmet:
ISSN:0031-3203, 1873-5142
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We propose a supervised learning approach to automatic quantification of cell populations in flow cytometric samples. One sample contains up to millions of measurement vectors with a dimensionality between 10 and 20. Normally, each measurement vector corresponds to a single cell in the biological sample. Identifying biologically meaningful cell populations is essentially a clustering problem, however, standard clustering methods are impractical, because size, shape and location of corresponding clusters may vary strongly between samples mainly due to phenotypic differences and inter-laboratory variations. In our holistic approach, we implicitly employ the structural information (such as relative locations and shape of sub-populations). A new input sample is reconstructed by a linear combination of artificial reference samples each represented by a Gaussian Mixture Model (GMM), in which for each Gaussian component the class label of the corresponding cluster of observations is known. The reference samples are calculated from a larger set of training samples by non-negative matrix factorization and can be regarded as the basis of a lower dimensional feature space, in which input samples are reconstructed. We show a method for calculating the feature space transformation based on minimization the L2 distance defined between two GMM. The feature space representation of the sample is then used to assign each observation to one of the specified sub-populations by a Bayes decision. We present classification results on a database of about 170 patients with Acute Lymphoblastic Leukemia (ALL), where high accuracy in the prediction of relatively small leukemic populations is crucial. The approach is not limited to our application. It can be employed wherever analysis of large, multi-dimensional, numerical data of a specific class of samples with related structure has to be performed. •A density model based on the interpolation of Gaussian mixture models is presented.•Non-negative matrix factorization is used to compress the model.•Affine registration of Gaussian mixture models using the L2 distance is performed.•The applicability of the method is demonstrated on flow cytometry data.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2016.04.004