Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer
Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets. A total of 364,...
Uloženo v:
| Vydáno v: | Cancer research and treatment Ročník 54; číslo 2; s. 517 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Korea (South)
01.04.2022
|
| Témata: | |
| ISSN: | 2005-9256, 2005-9256 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.
A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.
Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).
ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC. |
|---|---|
| AbstractList | Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.
A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.
Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).
ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC. Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.PURPOSEMachine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from two independent datasets.A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.MATERIALS AND METHODSA total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).RESULTSClinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p < 0.001). The ML-based model using the Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to American Joint Committee on Cancer stage (AUROC, 0.804 vs. 0.736; p < 0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).ML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.CONCLUSIONML-based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC. |
| Author | Osman, Mohamed Hosny Baik, Seung Hyuk Lee, Kang Young Sarhan, Hossam Mohamed Kang, Jeonghyun Mohamed, Reham Hosny Park, Eun Jung |
| Author_xml | – sequence: 1 givenname: Mohamed Hosny surname: Osman fullname: Osman, Mohamed Hosny organization: Faculty of Medicine, Zagazig University, Zagazig, Egypt – sequence: 2 givenname: Reham Hosny surname: Mohamed fullname: Mohamed, Reham Hosny organization: Faculty of Medicine, Zagazig University, Zagazig, Egypt – sequence: 3 givenname: Hossam Mohamed surname: Sarhan fullname: Sarhan, Hossam Mohamed organization: Faculty of Pharmacy, British University in Egypt (BUE), El Shorouk, Egypt – sequence: 4 givenname: Eun Jung surname: Park fullname: Park, Eun Jung organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea – sequence: 5 givenname: Seung Hyuk surname: Baik fullname: Baik, Seung Hyuk organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea – sequence: 6 givenname: Kang Young surname: Lee fullname: Lee, Kang Young organization: Department of Surgery, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea – sequence: 7 givenname: Jeonghyun surname: Kang fullname: Kang, Jeonghyun organization: Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34126702$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkElLBDEUhIMozqI3z5Kjlx6z9XaUxg1mcEA9N1leO5GeZEzSI_57WxzBy6tH1UcdaoaOnXeA0AUlC0EFv9YhLRhhdDzFEZoyQvKsZnlx_O-foFmM74QUgpf0FE24oKwoCZuidiX1xjrAS5DBWfeGV95Ajzsf8DqAsTr9mGsfk99BkMnuAT8PYW_3sse-w-vRApci_rRpgxvf-wA6jVkjnYZwhk462Uc4P-gcvd7dvjQP2fLp_rG5WWaaVyJluVEdpwqEEqoCw-pCV8JQkBXralMaobguO5CCmo7kQnJeKVNSxTQZca3ZHF399u6C_xggpnZro4a-lw78EFuWC8rHPap6RC8P6KC2YNpdsFsZvtq_Udg3FddnDg |
| CitedBy_id | crossref_primary_10_3748_wjg_v31_i18_106670 crossref_primary_10_1016_j_health_2022_100132 crossref_primary_10_1016_j_heliyon_2024_e41443 crossref_primary_10_3389_froh_2024_1462873 crossref_primary_10_1002_mef2_100 crossref_primary_10_1007_s00432_023_04880_2 crossref_primary_10_1016_j_suronc_2023_102009 crossref_primary_10_1016_j_ejso_2025_110194 crossref_primary_10_3389_fonc_2024_1396726 crossref_primary_10_1016_j_cmpb_2024_108159 crossref_primary_10_1016_j_suronc_2024_102079 crossref_primary_10_1016_j_cmpb_2025_108874 crossref_primary_10_1186_s12874_025_02463_y crossref_primary_10_1371_journal_pone_0278562 crossref_primary_10_3748_wjg_v31_i30_108431 crossref_primary_10_1186_s12885_025_14303_9 crossref_primary_10_3389_fmed_2024_1266278 crossref_primary_10_1177_03000605231198725 crossref_primary_10_1371_journal_pone_0280606 crossref_primary_10_1002_hsr2_70336 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.4143/crt.2021.206 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 2005-9256 |
| ExternalDocumentID | 34126702 |
| Genre | Journal Article |
| GroupedDBID | --- 29B 5-W 53G 8JR 9ZL ABDBF ACUHS ACYCR ADBBV AENEX ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL C1A CGR CUY CVF DIK E3Z EBD ECM EF. EIF F5P HYE NPM OK1 RPM TR2 7X8 |
| ID | FETCH-LOGICAL-c384t-5dbf31be4b4b8ed296c84d1ea82f9d7d4b3c7fea41df054a338bd71b2c08edcc2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 21 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000789993400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2005-9256 |
| IngestDate | Thu Oct 02 10:54:45 EDT 2025 Thu Apr 03 07:02:38 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | SEER Colorectal neoplasms Area under the curve LightGBM Mortality Machine learning |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c384t-5dbf31be4b4b8ed296c84d1ea82f9d7d4b3c7fea41df054a338bd71b2c08edcc2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC9016295 |
| PMID | 34126702 |
| PQID | 2541320089 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2541320089 pubmed_primary_34126702 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-04-01 |
| PublicationDateYYYYMMDD | 2022-04-01 |
| PublicationDate_xml | – month: 04 year: 2022 text: 2022-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Korea (South) |
| PublicationPlace_xml | – name: Korea (South) |
| PublicationTitle | Cancer research and treatment |
| PublicationTitleAlternate | Cancer Res Treat |
| PublicationYear | 2022 |
| SSID | ssj0064371 |
| Score | 2.352152 |
| Snippet | Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 517 |
| SubjectTerms | Colorectal Neoplasms - pathology Humans Machine Learning Predictive Value of Tests ROC Curve Survival Rate |
| Title | Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34126702 https://www.proquest.com/docview/2541320089 |
| Volume | 54 |
| WOSCitedRecordID | wos000789993400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1JS8NAFB7UinhxX-rGCF6DyWSSmZxEisWDLQUXeguziiBJTVp_v-8lqZ4EwUsumUB48-at33yPkKvI-sxj1UZopQIuTRIokyRBmDGfaOm0b8iqXx7EeCyn02zSFdzqDla5tImNobalwRr5NSQyeNs3lNnN7CPAqVHYXe1GaKySXgyhDEK6xPS7i4A9KUy4GrbNDHx7C3znECJcmwqBlAwzxPT34LJxMsPt__7eDtnqwkt62-rDLllxxR7ZGHUN9H2SjxrwpKMdr-orxWFo7xRCVzqpcBnCoCmO8C1nrmUFp48LsCegkbT0dNLysNYUC7h0AKYTTSa8G6D6VAfkeXj3NLgPuhkLgYklnweJ1T6OtOOaw85YlqVGchs5JZnPrLBcx0Z4pzhsKkR3CjJabUWkmQlhuTHskKwVZeGOCU0dE076yISpgqTPKmNCAz6YSaM8aEqfXC5Fl4MOY2NCFa5c1PmP8PrkqJV_PmvJNnLwsiwVITv5w9enZJPh7YQGWHNGeh5OsDsn6-Zz_lZXF41ywHM8GX0BvEDHyg |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning+Model+for+Predicting+Postoperative+Survival+of+Patients+with+Colorectal+Cancer&rft.jtitle=Cancer+research+and+treatment&rft.au=Osman%2C+Mohamed+Hosny&rft.au=Mohamed%2C+Reham+Hosny&rft.au=Sarhan%2C+Hossam+Mohamed&rft.au=Park%2C+Eun+Jung&rft.date=2022-04-01&rft.eissn=2005-9256&rft.volume=54&rft.issue=2&rft.spage=517&rft_id=info:doi/10.4143%2Fcrt.2021.206&rft_id=info%3Apmid%2F34126702&rft_id=info%3Apmid%2F34126702&rft.externalDocID=34126702 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2005-9256&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2005-9256&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2005-9256&client=summon |