SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection

Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Interdisciplinary sciences : computational life sciences
Hauptverfasser: Du, Ruchao, Dong, Jinxin, Jiang, Hua, Qi, Minyong, Zhang, Yuxi, Sun, Ranran, Xu, Mengke
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Germany 27.11.2025
Schlagworte:
ISSN:1867-1462, 1867-1462
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1867-1462
1867-1462
DOI:10.1007/s12539-025-00795-3