RCA2: a scalable supervised clustering algorithm that reduces batch effects in scRNA-seq data

Abstract The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nucleic acids research Jg. 49; H. 15; S. 8505 - 8519
Hauptverfasser: Schmidt, Florian, Ranjan, Bobby, Lin, Quy Xiao Xuan, Krishnan, Vaidehi, Joanito, Ignasius, Honardoost, Mohammad Amin, Nawaz, Zahid, Venkatesh, Prasanna Nori, Tan, Joanna, Rayan, Nirmala Arul, Ong, Sin Tiong, Prabhakar, Shyam
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England Oxford University Press 07.09.2021
Schlagworte:
ISSN:0305-1048, 1362-4962, 1362-4962
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract The transcriptomic diversity of cell types in the human body can be analysed in unprecedented detail using single cell (SC) technologies. Unsupervised clustering of SC transcriptomes, which is the default technique for defining cell types, is prone to group cells by technical, rather than biological, variation. Compared to de-novo (unsupervised) clustering, we demonstrate using multiple benchmarks that supervised clustering, which uses reference transcriptomes as a guide, is robust to batch effects and data quality artifacts. Here, we present RCA2, the first algorithm to combine reference projection (batch effect robustness) with graph-based clustering (scalability). In addition, RCA2 provides a user-friendly framework incorporating multiple commonly used downstream analysis modules. RCA2 also provides new reference panels for human and mouse and supports generation of custom panels. Furthermore, RCA2 facilitates cell type-specific QC, which is essential for accurate clustering of data from heterogeneous tissues. We demonstrate the advantages of RCA2 on SC data from human bone marrow, healthy PBMCs and PBMCs from COVID-19 patients. Scalable supervised clustering methods such as RCA2 will facilitate unified analysis of cohort-scale SC datasets.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.
ISSN:0305-1048
1362-4962
1362-4962
DOI:10.1093/nar/gkab632