SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection

Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Interdisciplinary sciences : computational life sciences
Hlavní autoři: Du, Ruchao, Dong, Jinxin, Jiang, Hua, Qi, Minyong, Zhang, Yuxi, Sun, Ranran, Xu, Mengke
Médium: Journal Article
Jazyk:angličtina
Vydáno: Germany 27.11.2025
Témata:
ISSN:1867-1462, 1867-1462
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1867-1462
1867-1462
DOI:10.1007/s12539-025-00795-3