SSLCNV: A Semi-supervised Learning Framework for Accurate Copy Number Variation Detection
Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such...
Gespeichert in:
| Veröffentlicht in: | Interdisciplinary sciences : computational life sciences |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Germany
27.11.2025
|
| Schlagworte: | |
| ISSN: | 1867-1462, 1867-1462 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Copy number variation (CNV) is a major type of structural variation (SV) that plays critical roles in genetic diversity and disease. Currently, many CNV detection tools have been developed. Although each tool exhibits different advantages under specific scenarios, they still have disadvantages such as suboptimal sensitivity, imprecise breakpoint resolution, and reduced robustness in complex sequencing environments. Developing more effective CNV detection tools by building upon the strengths of existing tools presents a significant challenge in the field. To fully leverage the detection results of existing tools and improve the accuracy of CNV detection under complex sequencing conditions, a new method called SSLCNV (semi-supervised learning framework for CNV detection) is proposed. It combines consensus-based pseudo-labeling using density-based clustering. SSLCNV generates high-confidence pseudo-labels by intersecting CNV predictions from four representative tools (CNVkit, GROM-RD, Matchclips2, OTSUCNV) and uses these as core seeds for clustering. Additionally, SSLCNV introduces a new constraint z-score into the DBSCAN algorithm to enhance clustering accuracy. By leveraging the improved DBSCAN and incorporating reliable labels, SSLCNV effectively detects CNV from partially labeled and unlabeled data. Comprehensive evaluations on both simulated and real datasets demonstrate that SSLCNV consistently achieves superior F1-scores compared to existing tools across diverse sequencing depths and tumor purities. Importantly, it maintains robust performance under low-coverage conditions, yielding higher recall without a substantial loss in precision. SSLCNV offers a scalable and accurate solution for CNV detection, particularly advantageous in scenarios with complex genomic backgrounds. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1867-1462 1867-1462 |
| DOI: | 10.1007/s12539-025-00795-3 |