An active learning approach for clustering single-cell RNA-seq data

Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for cluster...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Laboratory investigation Jg. 102; H. 3; S. 227 - 235
Hauptverfasser: Lin, Xiang, Liu, Haoran, Wei, Zhi, Roy, Senjuti Basu, Gao, Nan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Elsevier Inc 01.03.2022
Nature Publishing Group US
Nature Publishing Group
Schlagworte:
ISSN:0023-6837, 1530-0307, 1530-0307
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated—a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently. Active learning (AL) model is a framework designed for single-cell RNA sequence (scRNA-seq) clustering. This model requires that the researchers label a small number of cells selected by a sample selection algorithm. The labeled cells are then used for the supervision of the clustering, to significantly boost the clustering performance of scRNA-seq.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Author contribution
X.L., Z.W., and S.B. performed study design and development of methodology. Z.W, S.B, and N.G. review and revision of the paper; X.L. and H.L. performed data analysis and interpretation, and statistical analysis; Z.W and S.B. provided technical and material support. All authors read and approved the final paper.
ISSN:0023-6837
1530-0307
1530-0307
DOI:10.1038/s41374-021-00639-w