A framework to create, evaluate and select synthetic datasets for survival prediction in oncology

Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, eval...

Full description

Saved in:
Bibliographic Details
Published in:Computers in biology and medicine Vol. 192; no. Pt A; p. 110198
Main Authors: Christoforou, A.T., Spohn, S.K.B., Sprave, T., Fechter, T., Rühle, A., Nicolay, N.H., Popp, I., Grosu, A.L., Peeken, J.C., Thieme, A.H., Stylianopoulos, T., Strouthos, I., Ferentinos, K., Roussakis, Y., Zamboglou, C.
Format: Journal Article
Language:English
Published: United States Elsevier Ltd 01.06.2025
Subjects:
ISSN:0010-4825, 1879-0534, 1879-0534
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. •A novel framework for generating and evaluating deep learning synthetic datasets.•Framework tested on 5 clinical datasets containing survival and treatment details.•Robust metrics assess privacy, clinical relevance, and data distribution.•Synthetic data retains real-world characteristics, ensuring privacy and usability.•Framework advances secure data sharing in medicine, addressing privacy issues.
AbstractList AbstractBackground and purposeData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Materials and methodsFive retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. ResultsThe framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. ConclusionThe proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.
Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. •A novel framework for generating and evaluating deep learning synthetic datasets.•Framework tested on 5 clinical datasets containing survival and treatment details.•Robust metrics assess privacy, clinical relevance, and data distribution.•Synthetic data retains real-world characteristics, ensuring privacy and usability.•Framework advances secure data sharing in medicine, addressing privacy issues.
Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO.BACKGROUND AND PURPOSEData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO.Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system.MATERIALS AND METHODSFive retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system.The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets.RESULTSThe framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets.The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.CONCLUSIONThe proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.
Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.
ArticleNumber 110198
Author Ferentinos, K.
Nicolay, N.H.
Sprave, T.
Grosu, A.L.
Zamboglou, C.
Fechter, T.
Rühle, A.
Thieme, A.H.
Spohn, S.K.B.
Christoforou, A.T.
Popp, I.
Peeken, J.C.
Roussakis, Y.
Stylianopoulos, T.
Strouthos, I.
Author_xml – sequence: 1
  givenname: A.T.
  surname: Christoforou
  fullname: Christoforou, A.T.
  email: andreas.christoforou3@goc.com.cy
  organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus
– sequence: 2
  givenname: S.K.B.
  orcidid: 0000-0003-2727-8005
  surname: Spohn
  fullname: Spohn, S.K.B.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 3
  givenname: T.
  surname: Sprave
  fullname: Sprave, T.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 4
  givenname: T.
  orcidid: 0000-0001-6271-9385
  surname: Fechter
  fullname: Fechter, T.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 5
  givenname: A.
  orcidid: 0000-0003-2022-897X
  surname: Rühle
  fullname: Rühle, A.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 6
  givenname: N.H.
  orcidid: 0000-0003-2550-1410
  surname: Nicolay
  fullname: Nicolay, N.H.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 7
  givenname: I.
  surname: Popp
  fullname: Popp, I.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 8
  givenname: A.L.
  surname: Grosu
  fullname: Grosu, A.L.
  organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany
– sequence: 9
  givenname: J.C.
  surname: Peeken
  fullname: Peeken, J.C.
  organization: Department of Radiation Oncology, Technical University Munich, Munich, Germany
– sequence: 10
  givenname: A.H.
  surname: Thieme
  fullname: Thieme, A.H.
  organization: Stanford Medicine, Stanford, USA
– sequence: 11
  givenname: T.
  orcidid: 0000-0002-3093-1696
  surname: Stylianopoulos
  fullname: Stylianopoulos, T.
  organization: Cancer Biophysics Lab, Department of Mechanical Engineering, University of Cyprus, Cyprus
– sequence: 12
  givenname: I.
  surname: Strouthos
  fullname: Strouthos, I.
  organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus
– sequence: 13
  givenname: K.
  orcidid: 0000-0003-1391-6600
  surname: Ferentinos
  fullname: Ferentinos, K.
  organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus
– sequence: 14
  givenname: Y.
  orcidid: 0000-0002-9367-4906
  surname: Roussakis
  fullname: Roussakis, Y.
  organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus
– sequence: 15
  givenname: C.
  surname: Zamboglou
  fullname: Zamboglou, C.
  organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40273819$$D View this record in MEDLINE/PubMed
BookMark eNqNkV1rFDEUhoNU7Lb6FySXXjjrySSZjxuxLX5BwQv1OmQyJ5rtTLImmZX992bYqiAIvUoIz3nJeZ8LcuaDR0Iogy0D1rzabU2Y94MLM47bGmq5ZeW97x6RDevavgLJxRnZADCoRFfLc3KR0g4ABHB4Qs4F1C3vWL8h-oraqGf8GeIdzYGaiDrjS4oHPS3lRrUfacIJTabp6PN3zM7QUWedMCdqQ6RpiQdXcLqPODqTXfDUeRq8CVP4dnxKHls9JXx2f16Sr-_efrn5UN1-ev_x5uq2Mlx0ucJRcz7wfuwGzvoGORpbN62URkvb8rYRgxbc9thYCSBH29iRg0EOsrU14_ySvDjl7mP4sWDKanbJ4DRpj2FJqqSKRjIBbUGf36PLUBpU--hmHY_qdy0F6E6AiSGliPYPwkCtBtRO_TWgVgPqZKCMXp9Gsex6cBhVMg69KdXEUqIag3tIyOt_QszkvDN6usMjpl1Yoi9dKqZSrUB9XkWvnuu1GtGvC7z5f8DD_vALS7W_CA
Cites_doi 10.3389/fonc.2022.898774
10.1007/s11060-021-03926-0
10.1145/3636424
10.5194/gmd-14-5205-2021
10.1016/j.csbj.2024.07.005
10.1016/j.neucom.2022.04.053
10.1109/ACCESS.2025.3532128
10.1038/s41746-023-00927-3
10.1001/jamanetworkopen.2023.14748
10.69554/HFOS8421
10.4236/jcc.2024.1211004
10.1001/jamanetworkopen.2023.0090
10.1002/sim.4154
10.1007/978-3-031-63219-8_24
ContentType Journal Article
Copyright 2025 Elsevier Ltd
Elsevier Ltd
Copyright © 2025 Elsevier Ltd. All rights reserved.
Copyright_xml – notice: 2025 Elsevier Ltd
– notice: Elsevier Ltd
– notice: Copyright © 2025 Elsevier Ltd. All rights reserved.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1016/j.compbiomed.2025.110198
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList

MEDLINE - Academic
MEDLINE

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1879-0534
EndPage 110198
ExternalDocumentID 40273819
10_1016_j_compbiomed_2025_110198
S0010482525005499
1_s2_0_S0010482525005499
Genre Journal Article
GroupedDBID ---
--K
--M
--Z
-~X
.1-
.55
.DC
.FO
.GJ
.~1
0R~
1B1
1P~
1RT
1~.
1~5
29F
4.4
457
4G.
53G
5GY
5VS
7-5
71M
77I
7RV
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
8G5
8P~
9JN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABBOA
ABFNM
ABJNI
ABMAC
ABMZM
ABOCM
ABUWG
ABWVN
ABXDB
ACDAQ
ACGFS
ACIEU
ACIUM
ACIWK
ACLOT
ACNNM
ACPRK
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADJOM
ADMUD
ADNMO
AEBSH
AEIPS
AEKER
AENEX
AEUPX
AEVXI
AFJKZ
AFKRA
AFPUW
AFRAH
AFRHN
AFTJW
AFXIZ
AGHFR
AGQPQ
AGUBO
AGYEJ
AHHHB
AHMBA
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AJRQY
AJUYK
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
ANZVX
AOUOD
APXCP
ARAPS
ASPBG
AVWKF
AXJTR
AZFZN
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
BKEYQ
BKOJK
BLXMC
BNPGV
BPHCQ
BVXVI
CCPQU
CS3
DU5
DWQXO
EBS
EFJIC
EFKBS
EFLBG
EJD
EMOBN
EO8
EO9
EP2
EP3
EX3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
FYUFA
G-2
G-Q
GBLVA
GBOLZ
GNUQQ
GUQSH
HCIFZ
HLZ
HMCUK
HMK
HMO
HVGLF
HZ~
IHE
J1W
K6V
K7-
KOM
LK8
LX9
M1P
M29
M2O
M41
M7P
MO0
N9A
NAPCQ
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
P62
PC.
PHGZM
PHGZT
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
Q38
R2-
ROL
RPZ
RXW
SAE
SBC
SCC
SDF
SDG
SDP
SEL
SES
SEW
SPC
SPCBC
SSH
SSV
SSZ
SV3
T5K
TAE
UAP
UKHRP
WOW
WUQ
X7M
XPP
Z5R
ZGI
~G-
~HD
AGCQF
PUEGO
9DU
AAYXX
AFFHD
CITATION
AFCTW
AGRNS
ALIPV
CGR
CUY
CVF
ECM
EIF
NPM
RIG
7X8
ID FETCH-LOGICAL-c348t-eda33b39d8b3196e3ecf26755ca5f73764ba43f9e6f5005df6fd30ce3057f2133
ISSN 0010-4825
1879-0534
IngestDate Sat Sep 27 21:24:41 EDT 2025
Sun May 25 01:41:15 EDT 2025
Sat Nov 29 07:38:33 EST 2025
Sat Sep 06 17:17:58 EDT 2025
Sun Sep 14 23:56:31 EDT 2025
Tue Oct 14 19:38:04 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue Pt A
Keywords Deep learning
Medicine
Evaluation
Artificial Intelligence
Survival Analysis
Oncology
Synthetic Data
Framework
Language English
License Copyright © 2025 Elsevier Ltd. All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c348t-eda33b39d8b3196e3ecf26755ca5f73764ba43f9e6f5005df6fd30ce3057f2133
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-2550-1410
0000-0003-1391-6600
0000-0003-2727-8005
0000-0002-3093-1696
0000-0001-6271-9385
0000-0003-2022-897X
0000-0002-9367-4906
PMID 40273819
PQID 3194651407
PQPubID 23479
PageCount 1
ParticipantIDs proquest_miscellaneous_3194651407
pubmed_primary_40273819
crossref_primary_10_1016_j_compbiomed_2025_110198
elsevier_sciencedirect_doi_10_1016_j_compbiomed_2025_110198
elsevier_clinicalkeyesjournals_1_s2_0_S0010482525005499
elsevier_clinicalkey_doi_10_1016_j_compbiomed_2025_110198
PublicationCentury 2000
PublicationDate 2025-06-01
PublicationDateYYYYMMDD 2025-06-01
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-06-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Computers in biology and medicine
PublicationTitleAlternate Comput Biol Med
PublicationYear 2025
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Zamboglou, Peeken, Janbain (bib18) May 2023; 6
Klement, Popp, Kaul (bib21) December 2021; 156
Meyer, Nagler, Hogan (bib13) 2021; 14
Uno, Cai, Pencina, D'Agostino, Wei (bib22) May 2011; 30
Pezoulas, Zaridis, Mylona (bib16) July 2024; 23
Spohn, Birkenmaier, Ruf (bib19) 2022; 12
Goodfellow, Pouget-Abadie, Mirza (bib10) 2014
Rühle, Marschner, Haderlein (bib20) February 2023; 6
Giuffrè, Shung (bib3) 2023; 6
Bauer, Trapp, Stenger (bib15) 2024
Dove, Phillips (bib1) 2015
Alwateer, Atlam, Abd El-Raouf, Ghoneim, Gad (bib4) November 2024; 12
Ghosheh, Li, Zhu (bib8) June 2024; 56
Livieris, Alimpertis, Domalis, Tsakalidis (bib14) 2024
Liu, Deho, Vadiee, Khalil, Joksimovic, Siemens (bib5) 2025
Synthetic Data Metrics.
D'Acquisto (bib2) March 2024; 6
Rajabi A, Garibay OO. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks.
Patki, Wedge, Veeramachaneni (bib23) 2016
Hernandez, Epelde, Alberdi, Cilla, Rankin (bib9) 2022; 493
Kiran, Rubini, Kumar (bib17) 2025; 13
Frid-Adar, Klang, Amitai, Goldberger, Greenspan (bib6) 2018
Accessed July 2024.
Rashidian, Wang, Moffitt (bib7) 2020
Version 0.13.0. Available at
Yale, Dash, Dutta, Isabelle, Pavao, Bennett (bib27) 2019
Xu, Skoularidou, Cuesta-Infante, Veeramachaneni (bib24) 2019; 32
Mi, Shen, Zhang (bib12) 2018
Lautrup, Hyrup, Zimek, Schneider-Kamp (bib26) 2024
Pezoulas (10.1016/j.compbiomed.2025.110198_bib16) 2024; 23
Xu (10.1016/j.compbiomed.2025.110198_bib24) 2019; 32
Hernandez (10.1016/j.compbiomed.2025.110198_bib9) 2022; 493
Klement (10.1016/j.compbiomed.2025.110198_bib21) 2021; 156
10.1016/j.compbiomed.2025.110198_bib11
Frid-Adar (10.1016/j.compbiomed.2025.110198_bib6) 2018
Liu (10.1016/j.compbiomed.2025.110198_bib5) 2025
Rashidian (10.1016/j.compbiomed.2025.110198_bib7) 2020
Rühle (10.1016/j.compbiomed.2025.110198_bib20) 2023; 6
Patki (10.1016/j.compbiomed.2025.110198_bib23) 2016
Ghosheh (10.1016/j.compbiomed.2025.110198_bib8) 2024; 56
Yale (10.1016/j.compbiomed.2025.110198_bib27) 2019
Zamboglou (10.1016/j.compbiomed.2025.110198_bib18) 2023; 6
Livieris (10.1016/j.compbiomed.2025.110198_bib14) 2024
Kiran (10.1016/j.compbiomed.2025.110198_bib17) 2025; 13
Alwateer (10.1016/j.compbiomed.2025.110198_bib4) 2024; 12
Goodfellow (10.1016/j.compbiomed.2025.110198_bib10) 2014
Mi (10.1016/j.compbiomed.2025.110198_bib12) 2018
Spohn (10.1016/j.compbiomed.2025.110198_bib19) 2022; 12
D'Acquisto (10.1016/j.compbiomed.2025.110198_bib2) 2024; 6
10.1016/j.compbiomed.2025.110198_bib25
Dove (10.1016/j.compbiomed.2025.110198_bib1) 2015
Uno (10.1016/j.compbiomed.2025.110198_bib22) 2011; 30
Meyer (10.1016/j.compbiomed.2025.110198_bib13) 2021; 14
Giuffrè (10.1016/j.compbiomed.2025.110198_bib3) 2023; 6
Bauer (10.1016/j.compbiomed.2025.110198_bib15) 2024
Lautrup (10.1016/j.compbiomed.2025.110198_bib26) 2024
References_xml – volume: 156
  start-page: 407
  year: December 2021
  end-page: 417
  ident: bib21
  article-title: Accelerated hyper-versus normofractionated radiochemotherapy with temozolomide in patients with glioblastoma: a multicenter retrospective analysis
  publication-title: J. Neuro Oncol.
– volume: 32
  year: 2019
  ident: bib24
  article-title: Modeling tabular data using conditional gan
  publication-title: Adv. Neural Inf. Process. Syst.
– volume: 14
  start-page: 5205
  year: 2021
  end-page: 5215
  ident: bib13
  article-title: Copula-based synthetic data augmentation for machine-learning emulators
  publication-title: Geosci. Model Dev. (GMD)
– volume: 6
  start-page: 186
  year: 2023
  ident: bib3
  article-title: Harnessing the power of synthetic data in healthcare: innovation, application, and privacy
  publication-title: npj Digit. Med.
– start-page: 2672
  year: 2014
  end-page: 2680
  ident: bib10
  article-title: Generative adversarial nets. Paper presented at
  publication-title: 28thAdv. Neural Inf. Process. Syst.
– reference: :Version 0.13.0. Available at:
– start-page: 37
  year: 2020
  end-page: 48
  ident: bib7
  article-title: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. Paper presented at
  publication-title: Artif. Intell. Med.
– volume: 30
  start-page: 1105
  year: May 2011
  end-page: 1117
  ident: bib22
  article-title: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data
  publication-title: Stat. Med.
– reference: Synthetic Data Metrics.
– year: 2024
  ident: bib15
  article-title: Comprehensive exploration of synthetic data generation: a survey
  publication-title: ArXiv,arXiv:2401.02524
– volume: 12
  start-page: 53
  year: November 2024
  end-page: 75
  ident: bib4
  article-title: Missing data imputation: a comprehensive review
  publication-title: J. Comput. Commun.
– year: 2024
  ident: bib26
  article-title: SynthEval: a framework for detailed utility and privacy evaluation of tabular synthetic data
  publication-title: arXiv preprint arXiv:2404.15821
– volume: 56
  year: June 2024
  ident: bib8
  article-title: A survey of generative adversarial networks for synthesizing structured electronic health records
  publication-title: ACM Comput. Surv.
– volume: 6
  year: February 2023
  ident: bib20
  article-title: Evaluation of concomitant systemic treatment in older adults with head and neck squamous cell carcinoma undergoing definitive radiotherapy
  publication-title: JAMA Netw. Open
– year: 2024
  ident: bib14
  article-title: An evaluation framework for synthetic data generation models
  publication-title: IFIP Advances in Information and Communication Technology
– year: 2019
  ident: bib27
  article-title: Privacy preserving synthetic health dat. Paper presented at
  publication-title: ESANN 2019 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
– volume: 23
  start-page: 2892
  year: July 2024
  end-page: 2910
  ident: bib16
  article-title: Synthetic data generation methods in healthcare: a review on open-source tools and methods
  publication-title: Comput. Struct. Biotechnol. J.
– year: 2018
  ident: bib6
  article-title: Synthetic data augmentation using GAN for improved liver lesion classification
  publication-title: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018)
– year: 2018
  ident: bib12
  article-title: A probe towards understanding gan and vae models
  publication-title: arXiv preprint arXiv:1812.05676
– volume: 6
  start-page: 227
  year: March 2024
  end-page: 239
  ident: bib2
  article-title: Synthetic data and data protection laws
  publication-title: Journal of Data Protection & Privacy
– volume: 6
  year: May 2023
  ident: bib18
  article-title: Development and validation of a multi-institutional nomogram of outcomes for PSMA-PET-based salvage radiotherapy for recurrent prostate cancer
  publication-title: JAMA Netw. Open
– reference: . Accessed July 2024.
– year: 2025
  ident: bib5
  article-title: Can synthetic data be fair and private? A comparative study of synthetic data generation and fairness algorithms
  publication-title: Paper Presented at
– year: 2015
  ident: bib1
  article-title: Privacy law, data sharing policies, and medical data: a comparative perspective
  publication-title: Medical Data Privacy Handbook
– year: 2016
  ident: bib23
  article-title: The synthetic data vault
  publication-title: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
– volume: 493
  start-page: 28
  year: 2022
  end-page: 45
  ident: bib9
  article-title: Synthetic data generation for tabular health records: a systematic review
  publication-title: Neurocomputing
– volume: 12
  year: 2022
  ident: bib19
  article-title: Risk factors for biochemical recurrence after PSMA-PET-guided definitive radiotherapy in patients with de novo lymph node-positive prostate cancer
  publication-title: Front. Oncol.
– reference: Rajabi A, Garibay OO. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks.
– volume: 13
  start-page: 15795
  year: 2025
  end-page: 15811
  ident: bib17
  article-title: Comprehensive review of privacy, utility, and fairness offered by synthetic data
  publication-title: IEEE Access
– volume: 12
  year: 2022
  ident: 10.1016/j.compbiomed.2025.110198_bib19
  article-title: Risk factors for biochemical recurrence after PSMA-PET-guided definitive radiotherapy in patients with de novo lymph node-positive prostate cancer
  publication-title: Front. Oncol.
  doi: 10.3389/fonc.2022.898774
– year: 2015
  ident: 10.1016/j.compbiomed.2025.110198_bib1
  article-title: Privacy law, data sharing policies, and medical data: a comparative perspective
– year: 2016
  ident: 10.1016/j.compbiomed.2025.110198_bib23
  article-title: The synthetic data vault
– ident: 10.1016/j.compbiomed.2025.110198_bib25
– year: 2018
  ident: 10.1016/j.compbiomed.2025.110198_bib6
  article-title: Synthetic data augmentation using GAN for improved liver lesion classification
– start-page: 37
  year: 2020
  ident: 10.1016/j.compbiomed.2025.110198_bib7
  article-title: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. Paper presented at
– volume: 156
  start-page: 407
  year: 2021
  ident: 10.1016/j.compbiomed.2025.110198_bib21
  article-title: Accelerated hyper-versus normofractionated radiochemotherapy with temozolomide in patients with glioblastoma: a multicenter retrospective analysis
  publication-title: J. Neuro Oncol.
  doi: 10.1007/s11060-021-03926-0
– volume: 32
  year: 2019
  ident: 10.1016/j.compbiomed.2025.110198_bib24
  article-title: Modeling tabular data using conditional gan
  publication-title: Adv. Neural Inf. Process. Syst.
– year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib26
  article-title: SynthEval: a framework for detailed utility and privacy evaluation of tabular synthetic data
  publication-title: arXiv preprint arXiv:2404.15821
– volume: 56
  issue: 6
  year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib8
  article-title: A survey of generative adversarial networks for synthesizing structured electronic health records
  publication-title: ACM Comput. Surv.
  doi: 10.1145/3636424
– volume: 14
  start-page: 5205
  year: 2021
  ident: 10.1016/j.compbiomed.2025.110198_bib13
  article-title: Copula-based synthetic data augmentation for machine-learning emulators
  publication-title: Geosci. Model Dev. (GMD)
  doi: 10.5194/gmd-14-5205-2021
– volume: 23
  start-page: 2892
  year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib16
  article-title: Synthetic data generation methods in healthcare: a review on open-source tools and methods
  publication-title: Comput. Struct. Biotechnol. J.
  doi: 10.1016/j.csbj.2024.07.005
– ident: 10.1016/j.compbiomed.2025.110198_bib11
– year: 2025
  ident: 10.1016/j.compbiomed.2025.110198_bib5
  article-title: Can synthetic data be fair and private? A comparative study of synthetic data generation and fairness algorithms
– volume: 493
  start-page: 28
  year: 2022
  ident: 10.1016/j.compbiomed.2025.110198_bib9
  article-title: Synthetic data generation for tabular health records: a systematic review
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2022.04.053
– volume: 13
  start-page: 15795
  year: 2025
  ident: 10.1016/j.compbiomed.2025.110198_bib17
  article-title: Comprehensive review of privacy, utility, and fairness offered by synthetic data
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2025.3532128
– year: 2019
  ident: 10.1016/j.compbiomed.2025.110198_bib27
  article-title: Privacy preserving synthetic health dat. Paper presented at
– volume: 6
  start-page: 186
  year: 2023
  ident: 10.1016/j.compbiomed.2025.110198_bib3
  article-title: Harnessing the power of synthetic data in healthcare: innovation, application, and privacy
  publication-title: npj Digit. Med.
  doi: 10.1038/s41746-023-00927-3
– volume: 6
  issue: 5
  year: 2023
  ident: 10.1016/j.compbiomed.2025.110198_bib18
  article-title: Development and validation of a multi-institutional nomogram of outcomes for PSMA-PET-based salvage radiotherapy for recurrent prostate cancer
  publication-title: JAMA Netw. Open
  doi: 10.1001/jamanetworkopen.2023.14748
– volume: 6
  start-page: 227
  issue: 3
  year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib2
  article-title: Synthetic data and data protection laws
  publication-title: Journal of Data Protection & Privacy
  doi: 10.69554/HFOS8421
– start-page: 2672
  year: 2014
  ident: 10.1016/j.compbiomed.2025.110198_bib10
  article-title: Generative adversarial nets. Paper presented at
  publication-title: 28thAdv. Neural Inf. Process. Syst.
– volume: 12
  start-page: 53
  issue: 11
  year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib4
  article-title: Missing data imputation: a comprehensive review
  publication-title: J. Comput. Commun.
  doi: 10.4236/jcc.2024.1211004
– year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib15
  article-title: Comprehensive exploration of synthetic data generation: a survey
  publication-title: ArXiv,arXiv:2401.02524
– volume: 6
  year: 2023
  ident: 10.1016/j.compbiomed.2025.110198_bib20
  article-title: Evaluation of concomitant systemic treatment in older adults with head and neck squamous cell carcinoma undergoing definitive radiotherapy
  publication-title: JAMA Netw. Open
  doi: 10.1001/jamanetworkopen.2023.0090
– volume: 30
  start-page: 1105
  issue: 10
  year: 2011
  ident: 10.1016/j.compbiomed.2025.110198_bib22
  article-title: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data
  publication-title: Stat. Med.
  doi: 10.1002/sim.4154
– year: 2018
  ident: 10.1016/j.compbiomed.2025.110198_bib12
  article-title: A probe towards understanding gan and vae models
  publication-title: arXiv preprint arXiv:1812.05676
– year: 2024
  ident: 10.1016/j.compbiomed.2025.110198_bib14
  article-title: An evaluation framework for synthetic data generation models
  doi: 10.1007/978-3-031-63219-8_24
SSID ssj0004030
Score 2.4120438
Snippet Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine...
AbstractBackground and purposeData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD),...
SourceID proquest
pubmed
crossref
elsevier
SourceType Aggregation Database
Index Database
Publisher
StartPage 110198
SubjectTerms Artificial Intelligence
Databases, Factual
Deep learning
Evaluation
Female
Framework
Humans
Internal Medicine
Machine Learning
Male
Medicine
Neoplasms - mortality
Oncology
Other
Prostatic Neoplasms - mortality
Retrospective Studies
Survival Analysis
Synthetic Data
Title A framework to create, evaluate and select synthetic datasets for survival prediction in oncology
URI https://www.clinicalkey.com/#!/content/1-s2.0-S0010482525005499
https://www.clinicalkey.es/playcontent/1-s2.0-S0010482525005499
https://dx.doi.org/10.1016/j.compbiomed.2025.110198
https://www.ncbi.nlm.nih.gov/pubmed/40273819
https://www.proquest.com/docview/3194651407
Volume 192
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1879-0534
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004030
  issn: 0010-4825
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Za9wwEBabTSl5KT3T7RFU6Ftqs7ZsS6JP25LQg4ZCt7BvwodEEoq9rL1Lfkh-cEeXnaYJ3Rb6YoxAsq35PBqNZr5B6DUYDSylsgzyjMRBkqlpwFRcBizKKIlJoVneTLEJenLCFgv-dTS69Lkwmx-0rtnFBV_-V1FDGwhbp87-hbj7QaEB7kHocAWxw3Urwc8OlQ-40oalsQqN19IRe9vzgtbUv9GEBWABatJWHSrays7QMxy2a9AgG52ktdIHOT4gsqnLwQvv6Q1cWQgTV3uV0un6qb2jMYDhm7XRSOE8HM7um1PriA0_h--uNOviSAZSfduxLE9dMRHX6HwWcTrEVnk9DNo_YTbludfDtiie06RglkS2PvVvSt76G861jJaWpCDUDwmHLr_yal9b7_ooRB_gdi6GkYQeSdiRdtBuTFPOxmh39vFo8WnIt50Sm9rkvsJFiNm4wZvf6jaz57ZtjTFv5vfRPbcvwTOLpwdoJOuH6O4XJ8NHKJ_hHla4a7CF1RvsQYVB5NiCCvegwh5UGKSOPajwACp8VmMPqsfo-_HR_P2HwJXnCEqSsC6QVU5IQXjFCq3HJZGlimH_mZZ5qigsXEmRJ0RxmakUdH2lMlWRaSlhhaEqjgh5gsZ1U8unCJOokCot0qJKwUImGa9oxTmoiUxWYDPnExT52RNLy8Ii_iS9CeJ-moXPMoZ1UQCKtuhLb-orW_eftyISbSym4pvhtwIIxPoTE84n6G3f09mw1jbd8rmvPB4EqHl9dpfXslm3AmY4yWBzM6UTtG-B0s9EYjipIv7sH2bpOdobftAXaNyt1vIlulNuurN2dYB26IIdOPj_BLKu2EA
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+framework+to+create%2C+evaluate+and+select+synthetic+datasets+for+survival+prediction+in+oncology&rft.jtitle=Computers+in+biology+and+medicine&rft.au=Christoforou%2C+A.T.&rft.au=Spohn%2C+S.K.B.&rft.au=Sprave%2C+T.&rft.au=Fechter%2C+T.&rft.date=2025-06-01&rft.issn=0010-4825&rft.volume=192&rft.spage=110198&rft_id=info:doi/10.1016%2Fj.compbiomed.2025.110198&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_compbiomed_2025_110198
thumbnail_m http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fcdn.clinicalkey.com%2Fck-thumbnails%2F00104825%2FS0010482525X0007X%2Fcov150h.gif