Benchmarking scRNA-seq copy number variation callers

Saved in:
Bibliographic Details
Title: Benchmarking scRNA-seq copy number variation callers
Authors: Schmid, Katharina T, Symeonidi, Aikaterini, Hlushchenko, Dmytro, Richter, Maria L, Tijhuis, Andréa E, Foijer, Floris, Colomé-Tatché, Maria
Source: Nature Communications. 16(1)
Publisher Information: Nature Publishing Group, 2025.
Publication Year: 2025
Subject Terms: Benchmarking, Single-Cell Analysis/methods, Humans, DNA Copy Number Variations/genetics, RNA-Seq/methods, Computational Biology/methods, Neoplasms/genetics, Single-Cell Gene Expression Analysis
Description: Copy number variations (CNVs), the gain or loss of genomic regions, are associated with disease, especially cancer. Single cell technologies offer new possibilities to capture within-sample heterogeneity of CNVs and identify subclones relevant for tumor progression and treatment outcome. Several computational tools have been developed to identify CNVs from scRNA-seq data. However, an independent benchmarking of them is lacking. Here, we evaluate six popular methods in their ability to correctly identify ground truth CNVs, euploid cells and subclonal structures in 21 scRNA-seq datasets. We discover dataset-specific factors influencing the performance, including dataset size, the number and type of CNVs in the sample and the choice of the reference dataset. Methods which include allelic information perform more robustly for large droplet-based datasets, but require higher runtime. Furthermore, the methods differ in their additional functionalities. We offer a benchmarking pipeline to identify the optimal method for new datasets, and improve methods' performance.
Document Type: Article
Language: English
ISSN: 2041-1723
DOI: 10.1038/s41467-025-62359-9
Access URL: https://research.rug.nl/en/publications/e0d47017-c370-4bf5-abd0-8854923f8733
https://hdl.handle.net/11370/e0d47017-c370-4bf5-abd0-8854923f8733
Rights: CC BY
Accession Number: edsair.dris...01423..7d5426b6623dabf70fa5c40868dff043
Database: OpenAIRE
Description
Abstract:Copy number variations (CNVs), the gain or loss of genomic regions, are associated with disease, especially cancer. Single cell technologies offer new possibilities to capture within-sample heterogeneity of CNVs and identify subclones relevant for tumor progression and treatment outcome. Several computational tools have been developed to identify CNVs from scRNA-seq data. However, an independent benchmarking of them is lacking. Here, we evaluate six popular methods in their ability to correctly identify ground truth CNVs, euploid cells and subclonal structures in 21 scRNA-seq datasets. We discover dataset-specific factors influencing the performance, including dataset size, the number and type of CNVs in the sample and the choice of the reference dataset. Methods which include allelic information perform more robustly for large droplet-based datasets, but require higher runtime. Furthermore, the methods differ in their additional functionalities. We offer a benchmarking pipeline to identify the optimal method for new datasets, and improve methods' performance.
ISSN:20411723
DOI:10.1038/s41467-025-62359-9