Learning from small medical data—robust semi-supervised cancer prognosis classifier with Bayesian variational autoencoder
Abstract Motivation Cancer is one of the world’s leading mortality causes, and its prognosis is hard to predict due to complicated biological interactions among heterogeneous data types. Numerous challenges, such as censorship, high dimensionality and small sample size, prevent researchers from usin...
Gespeichert in:
| Veröffentlicht in: | Bioinformatics advances Jg. 3; H. 1; S. vbac100 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
Oxford University Press
2023
|
| Schlagworte: | |
| ISSN: | 2635-0041, 2635-0041 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Abstract
Motivation
Cancer is one of the world’s leading mortality causes, and its prognosis is hard to predict due to complicated biological interactions among heterogeneous data types. Numerous challenges, such as censorship, high dimensionality and small sample size, prevent researchers from using deep learning models for precise prediction.
Results
We propose a robust Semi-supervised Cancer prognosis classifier with bAyesian variational autoeNcoder (SCAN) as a structured machine-learning framework for cancer prognosis prediction. SCAN incorporates semi-supervised learning for predicting 5-year disease-specific survival and overall survival in breast and non-small cell lung cancer (NSCLC) patients, respectively. SCAN achieved significantly better AUROC scores than all existing benchmarks (81.73% for breast cancer; 80.46% for NSCLC), including our previously proposed bimodal neural network classifiers (77.71% for breast cancer; 78.67% for NSCLC). Independent validation results showed that SCAN still achieved better AUROC scores (74.74% for breast; 72.80% for NSCLC) than the bimodal neural network classifiers (64.13% for breast; 67.07% for NSCLC). SCAN is general and can potentially be trained on more patient data. This paves the foundation for personalized medicine for early cancer risk screening.
Availability and implementation
The source codes reproducing the main results are available on GitHub: https://gitfront.io/r/user-4316673/36e8714573f3fbfa0b24690af5d1a9d5ca159cf4/scan/.
Supplementary information
Supplementary data are available at Bioinformatics Advances online. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2635-0041 2635-0041 |
| DOI: | 10.1093/bioadv/vbac100 |