Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer

Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets. A total of 364,...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Cancer research and treatment Ročník 54; číslo 2; s. 517
Hlavní autori: Osman, Mohamed Hosny, Mohamed, Reham Hosny, Sarhan, Hossam Mohamed, Park, Eun Jung, Baik, Seung Hyuk, Lee, Kang Young, Kang, Jeonghyun
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Korea (South) 01.04.2022
Predmet:
ISSN:2005-9256, 2005-9256
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets. A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values. Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com). ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.
AbstractList Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets. A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values. Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com). ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.
Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.PURPOSEMachine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.MATERIALS AND METHODSA total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).RESULTSClinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.CONCLUSIONML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.
Author Osman, Mohamed Hosny
Baik, Seung Hyuk
Lee, Kang Young
Sarhan, Hossam Mohamed
Kang, Jeonghyun
Mohamed, Reham Hosny
Park, Eun Jung
Author_xml – sequence: 1
  givenname: Mohamed Hosny
  surname: Osman
  fullname: Osman, Mohamed Hosny
  organization: Faculty of Medicine, Zagazig University, Zagazig, Egypt
– sequence: 2
  givenname: Reham Hosny
  surname: Mohamed
  fullname: Mohamed, Reham Hosny
  organization: Faculty of Medicine, Zagazig University, Zagazig, Egypt
– sequence: 3
  givenname: Hossam Mohamed
  surname: Sarhan
  fullname: Sarhan, Hossam Mohamed
  organization: Faculty of Pharmacy, British University in Egypt (BUE), El Shorouk, Egypt
– sequence: 4
  givenname: Eun Jung
  surname: Park
  fullname: Park, Eun Jung
  organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
– sequence: 5
  givenname: Seung Hyuk
  surname: Baik
  fullname: Baik, Seung Hyuk
  organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
– sequence: 6
  givenname: Kang Young
  surname: Lee
  fullname: Lee, Kang Young
  organization: Department of Surgery, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
– sequence: 7
  givenname: Jeonghyun
  surname: Kang
  fullname: Kang, Jeonghyun
  organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34126702$$D View this record in MEDLINE/PubMed
BookMark eNpNkElLBDEUhIMozqI3z5Kjlx6z9XaUxg1mcEA9N1leO5GeZEzSI_57WxzBy6tH1UcdaoaOnXeA0AUlC0EFv9YhLRhhdDzFEZoyQvKsZnlx_O-foFmM74QUgpf0FE24oKwoCZuidiX1xjrAS5DBWfeGV95Ajzsf8DqAsTr9mGsfk99BkMnuAT8PYW_3sse-w-vRApci_rRpgxvf-wA6jVkjnYZwhk462Uc4P-gcvd7dvjQP2fLp_rG5WWaaVyJluVEdpwqEEqoCw-pCV8JQkBXralMaobguO5CCmo7kQnJeKVNSxTQZca3ZHF399u6C_xggpnZro4a-lw78EFuWC8rHPap6RC8P6KC2YNpdsFsZvtq_Udg3FddnDg
CitedBy_id crossref_primary_10_3748_wjg_v31_i18_106670
crossref_primary_10_1016_j_health_2022_100132
crossref_primary_10_1016_j_heliyon_2024_e41443
crossref_primary_10_3389_froh_2024_1462873
crossref_primary_10_1002_mef2_100
crossref_primary_10_1007_s00432_023_04880_2
crossref_primary_10_1016_j_suronc_2023_102009
crossref_primary_10_1016_j_ejso_2025_110194
crossref_primary_10_3389_fonc_2024_1396726
crossref_primary_10_1016_j_cmpb_2024_108159
crossref_primary_10_1016_j_suronc_2024_102079
crossref_primary_10_1016_j_cmpb_2025_108874
crossref_primary_10_1186_s12874_025_02463_y
crossref_primary_10_1371_journal_pone_0278562
crossref_primary_10_3748_wjg_v31_i30_108431
crossref_primary_10_1186_s12885_025_14303_9
crossref_primary_10_3389_fmed_2024_1266278
crossref_primary_10_1177_03000605231198725
crossref_primary_10_1371_journal_pone_0280606
crossref_primary_10_1002_hsr2_70336
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.4143/crt.2021.206
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 2005-9256
ExternalDocumentID 34126702
Genre Journal Article
GroupedDBID ---
29B
5-W
53G
8JR
9ZL
ABDBF
ACUHS
ACYCR
ADBBV
AENEX
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
C1A
CGR
CUY
CVF
DIK
E3Z
EBD
ECM
EF.
EIF
F5P
HYE
NPM
OK1
RPM
TR2
7X8
ID FETCH-LOGICAL-c384t-5dbf31be4b4b8ed296c84d1ea82f9d7d4b3c7fea41df054a338bd71b2c08edcc2
IEDL.DBID 7X8
ISICitedReferencesCount 21
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000789993400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2005-9256
IngestDate Thu Oct 02 10:54:45 EDT 2025
Thu Apr 03 07:02:38 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords SEER
Colorectal neoplasms
Area under the curve
LightGBM
Mortality
Machine learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c384t-5dbf31be4b4b8ed296c84d1ea82f9d7d4b3c7fea41df054a338bd71b2c08edcc2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC9016295
PMID 34126702
PQID 2541320089
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2541320089
pubmed_primary_34126702
PublicationCentury 2000
PublicationDate 2022-04-01
PublicationDateYYYYMMDD 2022-04-01
PublicationDate_xml – month: 04
  year: 2022
  text: 2022-04-01
  day: 01
PublicationDecade 2020
PublicationPlace Korea (South)
PublicationPlace_xml – name: Korea (South)
PublicationTitle Cancer research and treatment
PublicationTitleAlternate Cancer Res Treat
PublicationYear 2022
SSID ssj0064371
Score 2.352061
Snippet Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 517
SubjectTerms Colorectal Neoplasms - pathology
Humans
Machine Learning
Predictive Value of Tests
ROC Curve
Survival Rate
Title Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer
URI https://www.ncbi.nlm.nih.gov/pubmed/34126702
https://www.proquest.com/docview/2541320089
Volume 54
WOSCitedRecordID wos000789993400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qinjx_VhfRPBa3KbpJj2JLC5edimosLeSpwjSru2uv9-ZtKsnQfBSKE2ghMnMN5kv8xFyw6RhjmWQnTgvIh7rNFKGi8hLxRMlPIudDmITYjqVs1mWdwduTUerXPnE4KhtZfCM_BYSGbztO5DZ3fwjQtUorK52EhrrpJcAlEFKl5h9VxGwJoUJV-i2mUFsb4nvHCDCramRSMkwQxz-Di5DkBnv_vf39shOBy_pfWsP-2TNlQdka9IV0A9JMQnkSUe7vqqvFMXQ3ilAV5rXOAxp0BQlfKu5a7uC06cl-BOwSFp5mrd9WBuKB7h0BK4TXSZ8G6H51EfkZfzwPHqMOo2FyCSSL6LUap_E2nHNtXSWZUMjuY2dksxnVliuEyO8Uzy2HtCdgoxWWxFrZgYw3Bh2TDbKqnSnhKaZSljKjIA5XCh4Sx3AF-WC7hXzfXK9WroCbBgLE6p01bIpfhavT07a9S_mbbONAqIsG4oBO_vD7HOyzfB2QiDWXJCehx3sLsmm-Vy8NfVVMA54TvPJFwsrxas
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning+Model+for+Predicting+Postoperative+Survival+of+Patients+with+Colorectal+Cancer&rft.jtitle=Cancer+research+and+treatment&rft.au=Osman%2C+Mohamed+Hosny&rft.au=Mohamed%2C+Reham+Hosny&rft.au=Sarhan%2C+Hossam+Mohamed&rft.au=Park%2C+Eun+Jung&rft.date=2022-04-01&rft.eissn=2005-9256&rft.volume=54&rft.issue=2&rft.spage=517&rft_id=info:doi/10.4143%2Fcrt.2021.206&rft_id=info%3Apmid%2F34126702&rft_id=info%3Apmid%2F34126702&rft.externalDocID=34126702
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2005-9256&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2005-9256&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2005-9256&client=summon