Developing a Clinical Prediction Score: Comparing Prediction Accuracy of Integer Scores to Statistical Regression Models
Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction ac...
Gespeichert in:
| Veröffentlicht in: | Anesthesia and analgesia Jg. 132; H. 6; S. 1603 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
01.06.2021
|
| Schlagworte: | |
| ISSN: | 1526-7598, 1526-7598 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use. |
|---|---|
| AbstractList | Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use. Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use. |
| Author | Subramanian, Vigneshwar Kattan, Michael W Mascha, Edward J |
| Author_xml | – sequence: 1 givenname: Vigneshwar surname: Subramanian fullname: Subramanian, Vigneshwar organization: From the Cleveland Clinic Lerner College of Medicine at Case Western Reserve University, Cleveland, Ohio – sequence: 2 givenname: Edward J surname: Mascha fullname: Mascha, Edward J organization: Departments of Quantitative Health Sciences and Outcomes Research and – sequence: 3 givenname: Michael W surname: Kattan fullname: Kattan, Michael W organization: Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33464759$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkEtPwzAQhC1URB_wDxDykUtKbMd5cKtKgUrlIQrnyLHXlVESBztB9N-T0iJ1Lzva-WYOO0aD2taA0CUJp4QSdjN7XkzDo-EspidoRDiNg4Rn6eBID9HY-88eImEan6EhY1Ec9cYI_dzBN5S2MfUGCzwvTW2kKPGrA2Vka2yN19I6uMVzWzXC7bAjbyZl54TcYqvxsm5hA27Pe9xavG5Fa3z7V_gGm_7qd6Enq6D05-hUi9LDxWFP0Mf94n3-GKxeHpbz2SqQjFMeKA1cCVbwVKe00ESGQAhoBTEthIiSiABnhImCQKyIjikVgoU9mXGZqUjTCbre9zbOfnXg27wyXkJZihps53MaJVkYhSShPXp1QLuiApU3zlTCbfP_b9FfX5Fw1Q |
| CitedBy_id | crossref_primary_10_1186_s13643_021_01841_z crossref_primary_10_1213_ANE_0000000000005773 crossref_primary_10_1213_ANE_0000000000006558 crossref_primary_10_1007_s11701_024_02152_w crossref_primary_10_1038_s41598_022_17916_3 crossref_primary_10_1007_s00380_023_02336_8 crossref_primary_10_1016_j_jclinane_2021_110511 crossref_primary_10_1016_j_jpsychores_2023_111385 crossref_primary_10_1136_bmjopen_2022_066197 crossref_primary_10_1002_cam4_70295 crossref_primary_10_1038_s41598_022_14827_1 crossref_primary_10_1097_ALN_0000000000003871 crossref_primary_10_1016_j_ijar_2024_109190 crossref_primary_10_1053_j_jvca_2023_06_025 crossref_primary_10_1213_ANE_0000000000006418 |
| ContentType | Journal Article |
| Copyright | Copyright © 2021 International Anesthesia Research Society. |
| Copyright_xml | – notice: Copyright © 2021 International Anesthesia Research Society. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1213/ANE.0000000000005362 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| EISSN | 1526-7598 |
| ExternalDocumentID | 33464759 |
| Genre | Journal Article Comparative Study |
| GroupedDBID | --- .-D .XZ .Z2 01R 026 0R~ 1J1 23M 2WC 40H 4Q1 4Q2 4Q3 5GY 5RE 5VS 71W 77Y 7O~ AAAAV AAAXR AAGIX AAHPQ AAIQE AAJCS AAMOA AAMTA AAQKA AARTV AASCR AASOK AAUEB AAWTL AAXQO ABASU ABBUW ABDIG ABJNI ABOCM ABPPZ ABVCZ ABXVJ ABXYN ABZAD ABZZY ACDDN ACDOF ACEWG ACGFO ACGFS ACIJW ACILI ACLDA ACLED ACWDW ACWRI ACXJB ACXNZ ACZKN ADBBV ADGGA ADHPY AE6 AEBDS AENEX AFBFQ AFDTB AFEXH AFMBP AFMFG AFNMH AFSOK AFUWQ AGINI AHOMT AHQNM AHQVU AHVBC AHXIK AIJEX AINUH AJCLO AJIOK AJNWD AJRGT AJZMW AKCTQ AKULP ALKUP ALMA_UNASSIGNED_HOLDINGS ALMTX AMJPA AMKUR AMNEI AOHHW AOQMC BAWUL BOYCO BQLVK C45 CGR CS3 CUY CVF DIWNM E.X E3Z EBS ECM EEVPB EIF ERAAH EX3 F2K F2L F2M F2N F5P FCALG FL- FRP GNXGY GQDEL GX1 H0~ HLJTE HZ~ IKREB IKYAY IN~ IPNFZ JF9 JG8 JK3 JK8 K8S KD2 KMI L-C L7B MZP N9A NPM N~7 N~B O9- OAG OAH OB4 ODMTH OHYEH OK1 OL1 OLG OLH OLL OLU OLV OLY OLZ OPUJH OVD OVDNE OVIDH OVLEI OVOZU OWBYB OWU OWV OWW OWX OWY OWZ OXXIT P2P PONUX RIG RLZ S4R S4S TEORI TR2 TSPGW V2I VVN W3M W8F WOQ WOW X3V X3W XXN XYM YFH YOC ZFV 7X8 AAFWJ ABPXF ACBKD ADKSD ADSXY |
| ID | FETCH-LOGICAL-c3525-dfe5da3b58f82bf1c0e11efde62baa4741e5313ab1e6d1f622aa302bf95c9d4f2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 20 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=00000539-202106000-00015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1526-7598 |
| IngestDate | Sun Nov 09 11:17:08 EST 2025 Thu Apr 03 06:56:59 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| License | Copyright © 2021 International Anesthesia Research Society. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c3525-dfe5da3b58f82bf1c0e11efde62baa4741e5313ab1e6d1f622aa302bf95c9d4f2 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| PMID | 33464759 |
| PQID | 2479040172 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2479040172 pubmed_primary_33464759 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-06-01 20210601 |
| PublicationDateYYYYMMDD | 2021-06-01 |
| PublicationDate_xml | – month: 06 year: 2021 text: 2021-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Anesthesia and analgesia |
| PublicationTitleAlternate | Anesth Analg |
| PublicationYear | 2021 |
| SSID | ssj0001086 |
| Score | 2.455808 |
| Snippet | Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1603 |
| SubjectTerms | Area Under Curve Computer Simulation - statistics & numerical data Computer Simulation - trends Forecasting Humans Models, Statistical Regression Analysis ROC Curve Stroke - diagnosis Stroke - epidemiology |
| Title | Developing a Clinical Prediction Score: Comparing Prediction Accuracy of Integer Scores to Statistical Regression Models |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/33464759 https://www.proquest.com/docview/2479040172 |
| Volume | 132 |
| WOSCitedRecordID | wos00000539-202106000-00015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrDwEK_ykpFYrTZOnAcLqkorBogqXupWOX6wVElJKIJ_z13iUhYkJDJkyTmK7s7ny539fYRceOAjmivOIDhGLIhiyTLBDVOQPuhA2CypyWCeb6M0jcfjZOQKbpXbVrmIiXWg1oXCGnmHB1ECDgfr7dXslSFrFHZXHYXGKmn5kMqgV0fjJVo4sgjVeKk8ZJFIYnd0jnt-p5cOGuhCdwkf-XJ-SzLrxWa49d_P3CabLs2kvcYvdsiKyXfJx_X3CSkqqUMEndJRic0aNBB9QFDLS9pvyAlB7MeznlLzUqpPWliKdcQXUzbyFX0rKCatNeYzvPDevDS7a3OKVGvTao88DQeP_RvmmBeYQnhUpq0RWvqZiG3MM-uprvE8Y7UJeSZlAFmIgbnry8wzofZsyLmUfhckE6ESHVi-T9byIjeHhEaICai1DcPYgPk1_EGJKNAWYWWkVLxNzheKnIBnY7tC5qaYV5OlKtvkoLHGZNZAcEx8PwgRqfDoD6OPyQbHjSh16eSEtCzMa3NK1tU76KU8q10G7uno7gu6PMvO |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+a+Clinical+Prediction+Score%3A+Comparing+Prediction+Accuracy+of+Integer+Scores+to+Statistical+Regression+Models&rft.jtitle=Anesthesia+and+analgesia&rft.au=Subramanian%2C+Vigneshwar&rft.au=Mascha%2C+Edward+J&rft.au=Kattan%2C+Michael+W&rft.date=2021-06-01&rft.eissn=1526-7598&rft.volume=132&rft.issue=6&rft.spage=1603&rft_id=info:doi/10.1213%2FANE.0000000000005362&rft_id=info%3Apmid%2F33464759&rft_id=info%3Apmid%2F33464759&rft.externalDocID=33464759 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1526-7598&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1526-7598&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1526-7598&client=summon |