scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data

Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Machine learning: science and technology Ročník 4; číslo 3; s. 35013 - 35024
Hlavní autoři: Lin, Eric, Liu, Boyuan, Lac, Leann, Fung, Daryl L X, Leung, Carson K, Hu, Pingzhao
Médium: Journal Article
Jazyk:angličtina
Vydáno: Bristol IOP Publishing 01.09.2023
Témata:
ISSN:2632-2153, 2632-2153
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE .
Bibliografie:MLST-100777.R2
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2632-2153
2632-2153
DOI:10.1088/2632-2153/acd7c3