EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing

Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-se...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Bioinformatics (Oxford, England) Ročník 38; číslo 10; s. 2692 - 2699
Hlavní autoři: Wei, Xin, Li, Ziyi, Ji, Hongkai, Wu, Hao
Médium: Journal Article
Jazyk:angličtina
Vydáno: England Oxford University Press 13.05.2022
Témata:
ISSN:1367-4803, 1367-4811
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations. We develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods. The R package is freely available at https://github.com/weix21/EDClust. Supplementary data are available at Bioinformatics online.
Bibliografie:The authors wish it to be known that, in their opinion, the Xin Wei and Ziyi Li should be regarded as Joint First Authors.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btac168