scSSA: A clustering method for single cell RNA-seq data based on semi-supervised autoencoder

•In this study, We proposed to use semi-supervised autoencoder to reduce the dimension of data, because it makes good use of some existing label information.•In Gaussian mixture clustering, we choose to use BIC index for machine parameter adjustment, which reduces the workload of manual parameter ad...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Methods (San Diego, Calif.) Jg. 208; S. 66 - 74
Hauptverfasser:	Zhao, Jian-Ping, Hou, Tong-Shuai, Su, Yansen, Zheng, Chun-Hou
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Inc 01.12.2022
Schlagworte:	data collection epigenome Fast independent component analysis Gaussian mixture clustering gene expression genes independent component analysis RNA scRNA-seq Semi-supervised autoencoder sequence analysis transcriptome scRNA-seq Gaussian mixture clustering Fast independent component analysis Semi-supervised autoencoder
ISSN:	1046-2023, 1095-9130, 1095-9130
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•In this study, We proposed to use semi-supervised autoencoder to reduce the dimension of data, because it makes good use of some existing label information.•In Gaussian mixture clustering, we choose to use BIC index for machine parameter adjustment, which reduces the workload of manual parameter adjustment.•Compared with the existing methods, our model shows better performance as a whole. Single cell sequencing is a technology for high-throughput sequencing analysis of genome, transcriptome and epigenome at the single cell level. It can improve the shortcomings of traditional methods, reveal the gene structure and gene expression state of a single cell, and reflect the heterogeneity between cells. Among them, the clustering analysis of single-cell RNA data is a very important step, but the clustering of single-cell RNA data is faced with two difficulties, dropout events and dimension curse. At present, many methods are only driven by data, and do not make full use of the existing biological information. In this work, we propose scSSA, a clustering model based on semi-supervised autoencoder, fast independent component analysis (FastICA) and Gaussian mixture clustering. Firstly, the semi-supervised autoencoder imputes and denoises the scRNA-seq data, and then get the low-dimensional latent representation. Secondly, the low-dimensional representation is reduced the dimension and clustered by FastICA and Gaussian mixture model respectively. Finally, scSSA is compared with Seurat, CIDR and other methods on 10 public scRNA-seq datasets. The results show that scSSA has superior performance in cell clustering on 10 public datasets. In conclusion, scSSA can accurately identify the cell types and is generally applicable to all kinds of single cell datasets. scSSA has great application potential in the field of scRNA-seq data analysis. Details in the code have been uploaded to the website https://github.com/houtongshuai123/scSSA/
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1046-2023 1095-9130 1095-9130
DOI:	10.1016/j.ymeth.2022.10.006