A framework to create, evaluate and select synthetic datasets for survival prediction in oncology
Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, eval...
Saved in:
| Published in: | Computers in biology and medicine Vol. 192; no. Pt A; p. 110198 |
|---|---|
| Main Authors: | , , , , , , , , , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
Elsevier Ltd
01.06.2025
|
| Subjects: | |
| ISSN: | 0010-4825, 1879-0534, 1879-0534 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO.
Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system.
The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets.
The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.
•A novel framework for generating and evaluating deep learning synthetic datasets.•Framework tested on 5 clinical datasets containing survival and treatment details.•Robust metrics assess privacy, clinical relevance, and data distribution.•Synthetic data retains real-world characteristics, ensuring privacy and usability.•Framework advances secure data sharing in medicine, addressing privacy issues. |
|---|---|
| AbstractList | AbstractBackground and purposeData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Materials and methodsFive retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. ResultsThe framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. ConclusionThe proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. •A novel framework for generating and evaluating deep learning synthetic datasets.•Framework tested on 5 clinical datasets containing survival and treatment details.•Robust metrics assess privacy, clinical relevance, and data distribution.•Synthetic data retains real-world characteristics, ensuring privacy and usability.•Framework advances secure data sharing in medicine, addressing privacy issues. Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO.BACKGROUND AND PURPOSEData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO.Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system.MATERIALS AND METHODSFive retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system.The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets.RESULTSThe framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets.The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers.CONCLUSIONThe proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine learning, offers a solution by mimicking real-world data without compromising privacy. This paper presents a general framework for generating, evaluating, and selecting high-quality tabular SD for clinical use, focusing on survival datasets in RO. Five retrospectively collected survival-based RO datasets (n = 1038 recurrent prostate cancer, n = 117 primary localised prostate cancer, n = 48 primary nodal positive (metastasised) prostate cancer, n = 1269 head and neck cancer, n = 353 gliomas) underwent cleaning and preparation. SD was generated using four different machine-learning models, with each model producing multiple variants. These were evaluated for privacy, clinical behaviour, and feature distribution using robust and interpretable metrics, with a single SDset being selected for each real-world dataset using a weighted ranking system. The framework successfully generated high-quality SD for every real-world dataset, with the Tabular Variational Autoencoder producing the five best performing SDsets considering all metrics. No more than 5 % of rows overlapped between each synthetic and real-world dataset. Cox proportional hazards models for the real-world and synthetic datasets achieved similar concordance indexes (Avg. Of real-world C-indexes = 0.701 vs 0.699 for SD C-indexes), with every SD hazard ratio falling within the 95 % confidence intervals of their real-world counterparts for 4 of the 5 real-world datasets. The proposed framework enables the production and selection of SDsets that closely mirror real-world data characteristics, ensuring privacy and clinical utility in RO. This approach can facilitate data sharing in clinical research, addressing privacy-related barriers. |
| ArticleNumber | 110198 |
| Author | Ferentinos, K. Nicolay, N.H. Sprave, T. Grosu, A.L. Zamboglou, C. Fechter, T. Rühle, A. Thieme, A.H. Spohn, S.K.B. Christoforou, A.T. Popp, I. Peeken, J.C. Roussakis, Y. Stylianopoulos, T. Strouthos, I. |
| Author_xml | – sequence: 1 givenname: A.T. surname: Christoforou fullname: Christoforou, A.T. email: andreas.christoforou3@goc.com.cy organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus – sequence: 2 givenname: S.K.B. orcidid: 0000-0003-2727-8005 surname: Spohn fullname: Spohn, S.K.B. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 3 givenname: T. surname: Sprave fullname: Sprave, T. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 4 givenname: T. orcidid: 0000-0001-6271-9385 surname: Fechter fullname: Fechter, T. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 5 givenname: A. orcidid: 0000-0003-2022-897X surname: Rühle fullname: Rühle, A. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 6 givenname: N.H. orcidid: 0000-0003-2550-1410 surname: Nicolay fullname: Nicolay, N.H. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 7 givenname: I. surname: Popp fullname: Popp, I. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 8 givenname: A.L. surname: Grosu fullname: Grosu, A.L. organization: Department of Radiation Oncology, Medical Center – University of Freiburg, Freiburg, Germany – sequence: 9 givenname: J.C. surname: Peeken fullname: Peeken, J.C. organization: Department of Radiation Oncology, Technical University Munich, Munich, Germany – sequence: 10 givenname: A.H. surname: Thieme fullname: Thieme, A.H. organization: Stanford Medicine, Stanford, USA – sequence: 11 givenname: T. orcidid: 0000-0002-3093-1696 surname: Stylianopoulos fullname: Stylianopoulos, T. organization: Cancer Biophysics Lab, Department of Mechanical Engineering, University of Cyprus, Cyprus – sequence: 12 givenname: I. surname: Strouthos fullname: Strouthos, I. organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus – sequence: 13 givenname: K. orcidid: 0000-0003-1391-6600 surname: Ferentinos fullname: Ferentinos, K. organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus – sequence: 14 givenname: Y. orcidid: 0000-0002-9367-4906 surname: Roussakis fullname: Roussakis, Y. organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus – sequence: 15 givenname: C. surname: Zamboglou fullname: Zamboglou, C. organization: Department of Radiation Oncology, German Oncology Center, Limassol, Cyprus |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40273819$$D View this record in MEDLINE/PubMed |
| BookMark | eNqNkV1rFDEUhoNU7Lb6FySXXjjrySSZjxuxLX5BwQv1OmQyJ5rtTLImmZX992bYqiAIvUoIz3nJeZ8LcuaDR0Iogy0D1rzabU2Y94MLM47bGmq5ZeW97x6RDevavgLJxRnZADCoRFfLc3KR0g4ABHB4Qs4F1C3vWL8h-oraqGf8GeIdzYGaiDrjS4oHPS3lRrUfacIJTabp6PN3zM7QUWedMCdqQ6RpiQdXcLqPODqTXfDUeRq8CVP4dnxKHls9JXx2f16Sr-_efrn5UN1-ev_x5uq2Mlx0ucJRcz7wfuwGzvoGORpbN62URkvb8rYRgxbc9thYCSBH29iRg0EOsrU14_ySvDjl7mP4sWDKanbJ4DRpj2FJqqSKRjIBbUGf36PLUBpU--hmHY_qdy0F6E6AiSGliPYPwkCtBtRO_TWgVgPqZKCMXp9Gsex6cBhVMg69KdXEUqIag3tIyOt_QszkvDN6usMjpl1Yoi9dKqZSrUB9XkWvnuu1GtGvC7z5f8DD_vALS7W_CA |
| Cites_doi | 10.3389/fonc.2022.898774 10.1007/s11060-021-03926-0 10.1145/3636424 10.5194/gmd-14-5205-2021 10.1016/j.csbj.2024.07.005 10.1016/j.neucom.2022.04.053 10.1109/ACCESS.2025.3532128 10.1038/s41746-023-00927-3 10.1001/jamanetworkopen.2023.14748 10.69554/HFOS8421 10.4236/jcc.2024.1211004 10.1001/jamanetworkopen.2023.0090 10.1002/sim.4154 10.1007/978-3-031-63219-8_24 |
| ContentType | Journal Article |
| Copyright | 2025 Elsevier Ltd Elsevier Ltd Copyright © 2025 Elsevier Ltd. All rights reserved. |
| Copyright_xml | – notice: 2025 Elsevier Ltd – notice: Elsevier Ltd – notice: Copyright © 2025 Elsevier Ltd. All rights reserved. |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1016/j.compbiomed.2025.110198 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1879-0534 |
| EndPage | 110198 |
| ExternalDocumentID | 40273819 10_1016_j_compbiomed_2025_110198 S0010482525005499 1_s2_0_S0010482525005499 |
| Genre | Journal Article |
| GroupedDBID | --- --K --M --Z -~X .1- .55 .DC .FO .GJ .~1 0R~ 1B1 1P~ 1RT 1~. 1~5 29F 4.4 457 4G. 53G 5GY 5VS 7-5 71M 77I 7RV 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ 8G5 8P~ 9JN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN AAYWO ABBOA ABFNM ABJNI ABMAC ABMZM ABOCM ABUWG ABWVN ABXDB ACDAQ ACGFS ACIEU ACIUM ACIWK ACLOT ACNNM ACPRK ACRLP ACRPL ACVFH ACZNC ADBBV ADCNI ADEZE ADJOM ADMUD ADNMO AEBSH AEIPS AEKER AENEX AEUPX AEVXI AFJKZ AFKRA AFPUW AFRAH AFRHN AFTJW AFXIZ AGHFR AGQPQ AGUBO AGYEJ AHHHB AHMBA AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AJRQY AJUYK AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU ANZVX AOUOD APXCP ARAPS ASPBG AVWKF AXJTR AZFZN AZQEC BBNVY BENPR BGLVJ BHPHI BKEYQ BKOJK BLXMC BNPGV BPHCQ BVXVI CCPQU CS3 DU5 DWQXO EBS EFJIC EFKBS EFLBG EJD EMOBN EO8 EO9 EP2 EP3 EX3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN FYUFA G-2 G-Q GBLVA GBOLZ GNUQQ GUQSH HCIFZ HLZ HMCUK HMK HMO HVGLF HZ~ IHE J1W K6V K7- KOM LK8 LX9 M1P M29 M2O M41 M7P MO0 N9A NAPCQ O-L O9- OAUVE OZT P-8 P-9 P2P P62 PC. PHGZM PHGZT PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO Q38 R2- ROL RPZ RXW SAE SBC SCC SDF SDG SDP SEL SES SEW SPC SPCBC SSH SSV SSZ SV3 T5K TAE UAP UKHRP WOW WUQ X7M XPP Z5R ZGI ~G- ~HD AGCQF PUEGO 9DU AAYXX AFFHD CITATION AFCTW AGRNS ALIPV CGR CUY CVF ECM EIF NPM RIG 7X8 |
| ID | FETCH-LOGICAL-c348t-eda33b39d8b3196e3ecf26755ca5f73764ba43f9e6f5005df6fd30ce3057f2133 |
| ISSN | 0010-4825 1879-0534 |
| IngestDate | Sat Sep 27 21:24:41 EDT 2025 Sun May 25 01:41:15 EDT 2025 Sat Nov 29 07:38:33 EST 2025 Sat Sep 06 17:17:58 EDT 2025 Sun Sep 14 23:56:31 EDT 2025 Tue Oct 14 19:38:04 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | Pt A |
| Keywords | Deep learning Medicine Evaluation Artificial Intelligence Survival Analysis Oncology Synthetic Data Framework |
| Language | English |
| License | Copyright © 2025 Elsevier Ltd. All rights reserved. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c348t-eda33b39d8b3196e3ecf26755ca5f73764ba43f9e6f5005df6fd30ce3057f2133 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0003-2550-1410 0000-0003-1391-6600 0000-0003-2727-8005 0000-0002-3093-1696 0000-0001-6271-9385 0000-0003-2022-897X 0000-0002-9367-4906 |
| PMID | 40273819 |
| PQID | 3194651407 |
| PQPubID | 23479 |
| PageCount | 1 |
| ParticipantIDs | proquest_miscellaneous_3194651407 pubmed_primary_40273819 crossref_primary_10_1016_j_compbiomed_2025_110198 elsevier_sciencedirect_doi_10_1016_j_compbiomed_2025_110198 elsevier_clinicalkeyesjournals_1_s2_0_S0010482525005499 elsevier_clinicalkey_doi_10_1016_j_compbiomed_2025_110198 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-06-01 |
| PublicationDateYYYYMMDD | 2025-06-01 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Computers in biology and medicine |
| PublicationTitleAlternate | Comput Biol Med |
| PublicationYear | 2025 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Zamboglou, Peeken, Janbain (bib18) May 2023; 6 Klement, Popp, Kaul (bib21) December 2021; 156 Meyer, Nagler, Hogan (bib13) 2021; 14 Uno, Cai, Pencina, D'Agostino, Wei (bib22) May 2011; 30 Pezoulas, Zaridis, Mylona (bib16) July 2024; 23 Spohn, Birkenmaier, Ruf (bib19) 2022; 12 Goodfellow, Pouget-Abadie, Mirza (bib10) 2014 Rühle, Marschner, Haderlein (bib20) February 2023; 6 Giuffrè, Shung (bib3) 2023; 6 Bauer, Trapp, Stenger (bib15) 2024 Dove, Phillips (bib1) 2015 Alwateer, Atlam, Abd El-Raouf, Ghoneim, Gad (bib4) November 2024; 12 Ghosheh, Li, Zhu (bib8) June 2024; 56 Livieris, Alimpertis, Domalis, Tsakalidis (bib14) 2024 Liu, Deho, Vadiee, Khalil, Joksimovic, Siemens (bib5) 2025 Synthetic Data Metrics. D'Acquisto (bib2) March 2024; 6 Rajabi A, Garibay OO. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks. Patki, Wedge, Veeramachaneni (bib23) 2016 Hernandez, Epelde, Alberdi, Cilla, Rankin (bib9) 2022; 493 Kiran, Rubini, Kumar (bib17) 2025; 13 Frid-Adar, Klang, Amitai, Goldberger, Greenspan (bib6) 2018 Accessed July 2024. Rashidian, Wang, Moffitt (bib7) 2020 Version 0.13.0. Available at Yale, Dash, Dutta, Isabelle, Pavao, Bennett (bib27) 2019 Xu, Skoularidou, Cuesta-Infante, Veeramachaneni (bib24) 2019; 32 Mi, Shen, Zhang (bib12) 2018 Lautrup, Hyrup, Zimek, Schneider-Kamp (bib26) 2024 Pezoulas (10.1016/j.compbiomed.2025.110198_bib16) 2024; 23 Xu (10.1016/j.compbiomed.2025.110198_bib24) 2019; 32 Hernandez (10.1016/j.compbiomed.2025.110198_bib9) 2022; 493 Klement (10.1016/j.compbiomed.2025.110198_bib21) 2021; 156 10.1016/j.compbiomed.2025.110198_bib11 Frid-Adar (10.1016/j.compbiomed.2025.110198_bib6) 2018 Liu (10.1016/j.compbiomed.2025.110198_bib5) 2025 Rashidian (10.1016/j.compbiomed.2025.110198_bib7) 2020 Rühle (10.1016/j.compbiomed.2025.110198_bib20) 2023; 6 Patki (10.1016/j.compbiomed.2025.110198_bib23) 2016 Ghosheh (10.1016/j.compbiomed.2025.110198_bib8) 2024; 56 Yale (10.1016/j.compbiomed.2025.110198_bib27) 2019 Zamboglou (10.1016/j.compbiomed.2025.110198_bib18) 2023; 6 Livieris (10.1016/j.compbiomed.2025.110198_bib14) 2024 Kiran (10.1016/j.compbiomed.2025.110198_bib17) 2025; 13 Alwateer (10.1016/j.compbiomed.2025.110198_bib4) 2024; 12 Goodfellow (10.1016/j.compbiomed.2025.110198_bib10) 2014 Mi (10.1016/j.compbiomed.2025.110198_bib12) 2018 Spohn (10.1016/j.compbiomed.2025.110198_bib19) 2022; 12 D'Acquisto (10.1016/j.compbiomed.2025.110198_bib2) 2024; 6 10.1016/j.compbiomed.2025.110198_bib25 Dove (10.1016/j.compbiomed.2025.110198_bib1) 2015 Uno (10.1016/j.compbiomed.2025.110198_bib22) 2011; 30 Meyer (10.1016/j.compbiomed.2025.110198_bib13) 2021; 14 Giuffrè (10.1016/j.compbiomed.2025.110198_bib3) 2023; 6 Bauer (10.1016/j.compbiomed.2025.110198_bib15) 2024 Lautrup (10.1016/j.compbiomed.2025.110198_bib26) 2024 |
| References_xml | – volume: 156 start-page: 407 year: December 2021 end-page: 417 ident: bib21 article-title: Accelerated hyper-versus normofractionated radiochemotherapy with temozolomide in patients with glioblastoma: a multicenter retrospective analysis publication-title: J. Neuro Oncol. – volume: 32 year: 2019 ident: bib24 article-title: Modeling tabular data using conditional gan publication-title: Adv. Neural Inf. Process. Syst. – volume: 14 start-page: 5205 year: 2021 end-page: 5215 ident: bib13 article-title: Copula-based synthetic data augmentation for machine-learning emulators publication-title: Geosci. Model Dev. (GMD) – volume: 6 start-page: 186 year: 2023 ident: bib3 article-title: Harnessing the power of synthetic data in healthcare: innovation, application, and privacy publication-title: npj Digit. Med. – start-page: 2672 year: 2014 end-page: 2680 ident: bib10 article-title: Generative adversarial nets. Paper presented at publication-title: 28thAdv. Neural Inf. Process. Syst. – reference: :Version 0.13.0. Available at: – start-page: 37 year: 2020 end-page: 48 ident: bib7 article-title: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. Paper presented at publication-title: Artif. Intell. Med. – volume: 30 start-page: 1105 year: May 2011 end-page: 1117 ident: bib22 article-title: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data publication-title: Stat. Med. – reference: Synthetic Data Metrics. – year: 2024 ident: bib15 article-title: Comprehensive exploration of synthetic data generation: a survey publication-title: ArXiv,arXiv:2401.02524 – volume: 12 start-page: 53 year: November 2024 end-page: 75 ident: bib4 article-title: Missing data imputation: a comprehensive review publication-title: J. Comput. Commun. – year: 2024 ident: bib26 article-title: SynthEval: a framework for detailed utility and privacy evaluation of tabular synthetic data publication-title: arXiv preprint arXiv:2404.15821 – volume: 56 year: June 2024 ident: bib8 article-title: A survey of generative adversarial networks for synthesizing structured electronic health records publication-title: ACM Comput. Surv. – volume: 6 year: February 2023 ident: bib20 article-title: Evaluation of concomitant systemic treatment in older adults with head and neck squamous cell carcinoma undergoing definitive radiotherapy publication-title: JAMA Netw. Open – year: 2024 ident: bib14 article-title: An evaluation framework for synthetic data generation models publication-title: IFIP Advances in Information and Communication Technology – year: 2019 ident: bib27 article-title: Privacy preserving synthetic health dat. Paper presented at publication-title: ESANN 2019 - European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning – volume: 23 start-page: 2892 year: July 2024 end-page: 2910 ident: bib16 article-title: Synthetic data generation methods in healthcare: a review on open-source tools and methods publication-title: Comput. Struct. Biotechnol. J. – year: 2018 ident: bib6 article-title: Synthetic data augmentation using GAN for improved liver lesion classification publication-title: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) – year: 2018 ident: bib12 article-title: A probe towards understanding gan and vae models publication-title: arXiv preprint arXiv:1812.05676 – volume: 6 start-page: 227 year: March 2024 end-page: 239 ident: bib2 article-title: Synthetic data and data protection laws publication-title: Journal of Data Protection & Privacy – volume: 6 year: May 2023 ident: bib18 article-title: Development and validation of a multi-institutional nomogram of outcomes for PSMA-PET-based salvage radiotherapy for recurrent prostate cancer publication-title: JAMA Netw. Open – reference: . Accessed July 2024. – year: 2025 ident: bib5 article-title: Can synthetic data be fair and private? A comparative study of synthetic data generation and fairness algorithms publication-title: Paper Presented at – year: 2015 ident: bib1 article-title: Privacy law, data sharing policies, and medical data: a comparative perspective publication-title: Medical Data Privacy Handbook – year: 2016 ident: bib23 article-title: The synthetic data vault publication-title: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) – volume: 493 start-page: 28 year: 2022 end-page: 45 ident: bib9 article-title: Synthetic data generation for tabular health records: a systematic review publication-title: Neurocomputing – volume: 12 year: 2022 ident: bib19 article-title: Risk factors for biochemical recurrence after PSMA-PET-guided definitive radiotherapy in patients with de novo lymph node-positive prostate cancer publication-title: Front. Oncol. – reference: Rajabi A, Garibay OO. TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks. – volume: 13 start-page: 15795 year: 2025 end-page: 15811 ident: bib17 article-title: Comprehensive review of privacy, utility, and fairness offered by synthetic data publication-title: IEEE Access – volume: 12 year: 2022 ident: 10.1016/j.compbiomed.2025.110198_bib19 article-title: Risk factors for biochemical recurrence after PSMA-PET-guided definitive radiotherapy in patients with de novo lymph node-positive prostate cancer publication-title: Front. Oncol. doi: 10.3389/fonc.2022.898774 – year: 2015 ident: 10.1016/j.compbiomed.2025.110198_bib1 article-title: Privacy law, data sharing policies, and medical data: a comparative perspective – year: 2016 ident: 10.1016/j.compbiomed.2025.110198_bib23 article-title: The synthetic data vault – ident: 10.1016/j.compbiomed.2025.110198_bib25 – year: 2018 ident: 10.1016/j.compbiomed.2025.110198_bib6 article-title: Synthetic data augmentation using GAN for improved liver lesion classification – start-page: 37 year: 2020 ident: 10.1016/j.compbiomed.2025.110198_bib7 article-title: SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. Paper presented at – volume: 156 start-page: 407 year: 2021 ident: 10.1016/j.compbiomed.2025.110198_bib21 article-title: Accelerated hyper-versus normofractionated radiochemotherapy with temozolomide in patients with glioblastoma: a multicenter retrospective analysis publication-title: J. Neuro Oncol. doi: 10.1007/s11060-021-03926-0 – volume: 32 year: 2019 ident: 10.1016/j.compbiomed.2025.110198_bib24 article-title: Modeling tabular data using conditional gan publication-title: Adv. Neural Inf. Process. Syst. – year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib26 article-title: SynthEval: a framework for detailed utility and privacy evaluation of tabular synthetic data publication-title: arXiv preprint arXiv:2404.15821 – volume: 56 issue: 6 year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib8 article-title: A survey of generative adversarial networks for synthesizing structured electronic health records publication-title: ACM Comput. Surv. doi: 10.1145/3636424 – volume: 14 start-page: 5205 year: 2021 ident: 10.1016/j.compbiomed.2025.110198_bib13 article-title: Copula-based synthetic data augmentation for machine-learning emulators publication-title: Geosci. Model Dev. (GMD) doi: 10.5194/gmd-14-5205-2021 – volume: 23 start-page: 2892 year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib16 article-title: Synthetic data generation methods in healthcare: a review on open-source tools and methods publication-title: Comput. Struct. Biotechnol. J. doi: 10.1016/j.csbj.2024.07.005 – ident: 10.1016/j.compbiomed.2025.110198_bib11 – year: 2025 ident: 10.1016/j.compbiomed.2025.110198_bib5 article-title: Can synthetic data be fair and private? A comparative study of synthetic data generation and fairness algorithms – volume: 493 start-page: 28 year: 2022 ident: 10.1016/j.compbiomed.2025.110198_bib9 article-title: Synthetic data generation for tabular health records: a systematic review publication-title: Neurocomputing doi: 10.1016/j.neucom.2022.04.053 – volume: 13 start-page: 15795 year: 2025 ident: 10.1016/j.compbiomed.2025.110198_bib17 article-title: Comprehensive review of privacy, utility, and fairness offered by synthetic data publication-title: IEEE Access doi: 10.1109/ACCESS.2025.3532128 – year: 2019 ident: 10.1016/j.compbiomed.2025.110198_bib27 article-title: Privacy preserving synthetic health dat. Paper presented at – volume: 6 start-page: 186 year: 2023 ident: 10.1016/j.compbiomed.2025.110198_bib3 article-title: Harnessing the power of synthetic data in healthcare: innovation, application, and privacy publication-title: npj Digit. Med. doi: 10.1038/s41746-023-00927-3 – volume: 6 issue: 5 year: 2023 ident: 10.1016/j.compbiomed.2025.110198_bib18 article-title: Development and validation of a multi-institutional nomogram of outcomes for PSMA-PET-based salvage radiotherapy for recurrent prostate cancer publication-title: JAMA Netw. Open doi: 10.1001/jamanetworkopen.2023.14748 – volume: 6 start-page: 227 issue: 3 year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib2 article-title: Synthetic data and data protection laws publication-title: Journal of Data Protection & Privacy doi: 10.69554/HFOS8421 – start-page: 2672 year: 2014 ident: 10.1016/j.compbiomed.2025.110198_bib10 article-title: Generative adversarial nets. Paper presented at publication-title: 28thAdv. Neural Inf. Process. Syst. – volume: 12 start-page: 53 issue: 11 year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib4 article-title: Missing data imputation: a comprehensive review publication-title: J. Comput. Commun. doi: 10.4236/jcc.2024.1211004 – year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib15 article-title: Comprehensive exploration of synthetic data generation: a survey publication-title: ArXiv,arXiv:2401.02524 – volume: 6 year: 2023 ident: 10.1016/j.compbiomed.2025.110198_bib20 article-title: Evaluation of concomitant systemic treatment in older adults with head and neck squamous cell carcinoma undergoing definitive radiotherapy publication-title: JAMA Netw. Open doi: 10.1001/jamanetworkopen.2023.0090 – volume: 30 start-page: 1105 issue: 10 year: 2011 ident: 10.1016/j.compbiomed.2025.110198_bib22 article-title: On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data publication-title: Stat. Med. doi: 10.1002/sim.4154 – year: 2018 ident: 10.1016/j.compbiomed.2025.110198_bib12 article-title: A probe towards understanding gan and vae models publication-title: arXiv preprint arXiv:1812.05676 – year: 2024 ident: 10.1016/j.compbiomed.2025.110198_bib14 article-title: An evaluation framework for synthetic data generation models doi: 10.1007/978-3-031-63219-8_24 |
| SSID | ssj0004030 |
| Score | 2.4120438 |
| Snippet | Data-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD), generated through machine... AbstractBackground and purposeData-driven decision-making in radiation oncology (RO) relies on integrating real-world data effectively. Synthetic data (SD),... |
| SourceID | proquest pubmed crossref elsevier |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 110198 |
| SubjectTerms | Artificial Intelligence Databases, Factual Deep learning Evaluation Female Framework Humans Internal Medicine Machine Learning Male Medicine Neoplasms - mortality Oncology Other Prostatic Neoplasms - mortality Retrospective Studies Survival Analysis Synthetic Data |
| Title | A framework to create, evaluate and select synthetic datasets for survival prediction in oncology |
| URI | https://www.clinicalkey.com/#!/content/1-s2.0-S0010482525005499 https://www.clinicalkey.es/playcontent/1-s2.0-S0010482525005499 https://dx.doi.org/10.1016/j.compbiomed.2025.110198 https://www.ncbi.nlm.nih.gov/pubmed/40273819 https://www.proquest.com/docview/3194651407 |
| Volume | 192 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1879-0534 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004030 issn: 0010-4825 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Za9wwEBabTSl5KT3T7RFU6Ftqs7ZsS6JP25LQg4ZCt7BvwodEEoq9rL1Lfkh-cEeXnaYJ3Rb6YoxAsq35PBqNZr5B6DUYDSylsgzyjMRBkqlpwFRcBizKKIlJoVneTLEJenLCFgv-dTS69Lkwmx-0rtnFBV_-V1FDGwhbp87-hbj7QaEB7kHocAWxw3Urwc8OlQ-40oalsQqN19IRe9vzgtbUv9GEBWABatJWHSrays7QMxy2a9AgG52ktdIHOT4gsqnLwQvv6Q1cWQgTV3uV0un6qb2jMYDhm7XRSOE8HM7um1PriA0_h--uNOviSAZSfduxLE9dMRHX6HwWcTrEVnk9DNo_YTbludfDtiie06RglkS2PvVvSt76G861jJaWpCDUDwmHLr_yal9b7_ooRB_gdi6GkYQeSdiRdtBuTFPOxmh39vFo8WnIt50Sm9rkvsJFiNm4wZvf6jaz57ZtjTFv5vfRPbcvwTOLpwdoJOuH6O4XJ8NHKJ_hHla4a7CF1RvsQYVB5NiCCvegwh5UGKSOPajwACp8VmMPqsfo-_HR_P2HwJXnCEqSsC6QVU5IQXjFCq3HJZGlimH_mZZ5qigsXEmRJ0RxmakUdH2lMlWRaSlhhaEqjgh5gsZ1U8unCJOokCot0qJKwUImGa9oxTmoiUxWYDPnExT52RNLy8Ii_iS9CeJ-moXPMoZ1UQCKtuhLb-orW_eftyISbSym4pvhtwIIxPoTE84n6G3f09mw1jbd8rmvPB4EqHl9dpfXslm3AmY4yWBzM6UTtG-B0s9EYjipIv7sH2bpOdobftAXaNyt1vIlulNuurN2dYB26IIdOPj_BLKu2EA |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+framework+to+create%2C+evaluate+and+select+synthetic+datasets+for+survival+prediction+in+oncology&rft.jtitle=Computers+in+biology+and+medicine&rft.au=Christoforou%2C+A.T.&rft.au=Spohn%2C+S.K.B.&rft.au=Sprave%2C+T.&rft.au=Fechter%2C+T.&rft.date=2025-06-01&rft.issn=0010-4825&rft.volume=192&rft.spage=110198&rft_id=info:doi/10.1016%2Fj.compbiomed.2025.110198&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_compbiomed_2025_110198 |
| thumbnail_m | http://cvtisr.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fcdn.clinicalkey.com%2Fck-thumbnails%2F00104825%2FS0010482525X0007X%2Fcov150h.gif |