Comparative benchmarking of single-cell clustering algorithms for transcriptomic and proteomic data

Background Differences in data distribution, feature dimensions, and quality between different single-cell modalities pose challenges for clustering. Although clustering algorithms have been developed for single-cell transcriptomic or proteomic data, their performance across different omics data typ...

Full description

Saved in:
Bibliographic Details
Published in:Genome Biology Vol. 26; no. 1; pp. 265 - 34
Main Authors: Yin, Yu-Hang, Wang, Fang, Li, Wei, Liu, Qiaoming, Zhou, Shengming, Zhou, Murong, Jiang, Zhongjun, Yu, Dong-Jun, Wang, Guohua
Format: Journal Article
Language:English
Published: London BioMed Central 03.09.2025
Springer Nature B.V
BMC
Subjects:
ISSN:1474-760X, 1474-7596, 1474-760X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background Differences in data distribution, feature dimensions, and quality between different single-cell modalities pose challenges for clustering. Although clustering algorithms have been developed for single-cell transcriptomic or proteomic data, their performance across different omics data types and integration scenarios remains poorly investigated, which limits the selection of methods and future method development. Results In this study, we conduct a systematic and comparative benchmark analysis of 28 computational algorithms on 10 paired transcriptomic and proteomic datasets, evaluating their performance across various metrics in terms of clustering, peak memory, and running time. We also discuss the impact of highly variable genes (HVGs) and cell type granularity on clustering performance. Additionally, the robustness of these clustering methods on two kinds of omics is evaluating by using 30 simulated datasets. Furthermore, to explore the benefits of integrating omics information for clustering tasks, we integrate single-cell transcriptomic and proteomic data using 7 state-of-the-art integration methods and assess the performance of existing single-omics clustering schemes on the integrated features. Conclusions Our findings reveal modality-specific strengths and limitations, highlight the complementary nature of existing methods, and provide actionable insights to guide the selection of appropriate clustering approaches for specific scenarios. Overall, for top performance across two omics, consider scAIDE, scDCC, and FlowSOM, with FlowSOM also offering excellent robustness. For users prioritizing memory efficiency scDCC and scDeepCluster are recommended, while TSCAN, SHARP, and MarkovHC are recommended for users who prioritize time efficiency, and community detection-based methods offer a balance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1474-760X
1474-7596
1474-760X
DOI:10.1186/s13059-025-03719-y