An active learning approach for clustering single-cell RNA-seq data
Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for cluster...
Gespeichert in:
| Veröffentlicht in: | Laboratory investigation Jg. 102; H. 3; S. 227 - 235 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Elsevier Inc
01.03.2022
Nature Publishing Group US Nature Publishing Group |
| Schlagworte: | |
| ISSN: | 0023-6837, 1530-0307, 1530-0307 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated—a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently.
Active learning (AL) model is a framework designed for single-cell RNA sequence (scRNA-seq) clustering. This model requires that the researchers label a small number of cells selected by a sample selection algorithm. The labeled cells are then used for the supervision of the clustering, to significantly boost the clustering performance of scRNA-seq. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Author contribution X.L., Z.W., and S.B. performed study design and development of methodology. Z.W, S.B, and N.G. review and revision of the paper; X.L. and H.L. performed data analysis and interpretation, and statistical analysis; Z.W and S.B. provided technical and material support. All authors read and approved the final paper. |
| ISSN: | 0023-6837 1530-0307 1530-0307 |
| DOI: | 10.1038/s41374-021-00639-w |