Developing a Clinical Prediction Score: Comparing Prediction Accuracy of Integer Scores to Statistical Regression Models

Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction ac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Anesthesia and analgesia Jg. 132; H. 6; S. 1603
Hauptverfasser: Subramanian, Vigneshwar, Mascha, Edward J, Kattan, Michael W
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 01.06.2021
Schlagworte:
ISSN:1526-7598, 1526-7598
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.
AbstractList Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.
Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of simplicity. However, this workflow discards useful information and reduces prediction accuracy. We, therefore, investigated the impact on prediction accuracy when researchers simplify a regression model into an integer score using a simulation study and an example clinical data set. Simulated independent training and test sets (n = 1000) were randomly generated such that a logistic regression model would perform at a specified target area under the receiver operating characteristic curve (AUC) of 0.7, 0.8, or 0.9. After fitting a logistic regression with continuous covariates to each data set, continuous variables were dichotomized using data-dependent cut points. A logistic regression was refit, and the coefficients were scaled and rounded to create an integer score. A risk classification system was built by stratifying integer scores into low-, intermediate-, and high-risk tertiles. Discrimination and calibration were assessed by calculating the AUC and index of prediction accuracy (IPA) for each model. The optimism in performance between the training set and test set was calculated for both AUC and IPA. The logistic regression model using the continuous form of covariates outperformed all other models. In the simulation study, converting the logistic regression model to an integer score and subsequent risk classification system incurred an average decrease of 0.057-0.094 in AUC, and an absolute 6.2%-17.5% in IPA. The largest decrease in both AUC and IPA occurred in the dichotomization step. The dichotomization and risk stratification steps also increased the optimism of the resulting models, such that they appeared to be able to predict better than they actually would on new data. In the clinical data set, converting the logistic regression with continuous covariates to an integer score incurred a decrease in externally validated AUC of 0.06 and a decrease in externally validated IPA of 13%. Converting a regression model to an integer score decreases model performance considerably. Therefore, we recommend developing a regression model that incorporates all available information to make the most accurate predictions possible, and using the unaltered regression model when making predictions for individual patients. In all cases, researchers should be mindful that they correctly validate the specific model that is intended for clinical use.
Author Subramanian, Vigneshwar
Kattan, Michael W
Mascha, Edward J
Author_xml – sequence: 1
  givenname: Vigneshwar
  surname: Subramanian
  fullname: Subramanian, Vigneshwar
  organization: From the Cleveland Clinic Lerner College of Medicine at Case Western Reserve University, Cleveland, Ohio
– sequence: 2
  givenname: Edward J
  surname: Mascha
  fullname: Mascha, Edward J
  organization: Departments of Quantitative Health Sciences and Outcomes Research and
– sequence: 3
  givenname: Michael W
  surname: Kattan
  fullname: Kattan, Michael W
  organization: Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33464759$$D View this record in MEDLINE/PubMed
BookMark eNpNkEtPwzAQhC1URB_wDxDykUtKbMd5cKtKgUrlIQrnyLHXlVESBztB9N-T0iJ1Lzva-WYOO0aD2taA0CUJp4QSdjN7XkzDo-EspidoRDiNg4Rn6eBID9HY-88eImEan6EhY1Ec9cYI_dzBN5S2MfUGCzwvTW2kKPGrA2Vka2yN19I6uMVzWzXC7bAjbyZl54TcYqvxsm5hA27Pe9xavG5Fa3z7V_gGm_7qd6Enq6D05-hUi9LDxWFP0Mf94n3-GKxeHpbz2SqQjFMeKA1cCVbwVKe00ESGQAhoBTEthIiSiABnhImCQKyIjikVgoU9mXGZqUjTCbre9zbOfnXg27wyXkJZihps53MaJVkYhSShPXp1QLuiApU3zlTCbfP_b9FfX5Fw1Q
CitedBy_id crossref_primary_10_1186_s13643_021_01841_z
crossref_primary_10_1213_ANE_0000000000005773
crossref_primary_10_1213_ANE_0000000000006558
crossref_primary_10_1007_s11701_024_02152_w
crossref_primary_10_1038_s41598_022_17916_3
crossref_primary_10_1007_s00380_023_02336_8
crossref_primary_10_1016_j_jclinane_2021_110511
crossref_primary_10_1016_j_jpsychores_2023_111385
crossref_primary_10_1136_bmjopen_2022_066197
crossref_primary_10_1002_cam4_70295
crossref_primary_10_1038_s41598_022_14827_1
crossref_primary_10_1097_ALN_0000000000003871
crossref_primary_10_1016_j_ijar_2024_109190
crossref_primary_10_1053_j_jvca_2023_06_025
crossref_primary_10_1213_ANE_0000000000006418
ContentType Journal Article
Copyright Copyright © 2021 International Anesthesia Research Society.
Copyright_xml – notice: Copyright © 2021 International Anesthesia Research Society.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1213/ANE.0000000000005362
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
EISSN 1526-7598
ExternalDocumentID 33464759
Genre Journal Article
Comparative Study
GroupedDBID ---
.-D
.XZ
.Z2
01R
026
0R~
1J1
23M
2WC
40H
4Q1
4Q2
4Q3
5GY
5RE
5VS
71W
77Y
7O~
AAAAV
AAAXR
AAGIX
AAHPQ
AAIQE
AAJCS
AAMOA
AAMTA
AAQKA
AARTV
AASCR
AASOK
AAUEB
AAWTL
AAXQO
ABASU
ABBUW
ABDIG
ABJNI
ABOCM
ABPPZ
ABVCZ
ABXVJ
ABXYN
ABZAD
ABZZY
ACDDN
ACDOF
ACEWG
ACGFO
ACGFS
ACIJW
ACILI
ACLDA
ACLED
ACWDW
ACWRI
ACXJB
ACXNZ
ACZKN
ADBBV
ADGGA
ADHPY
AE6
AEBDS
AENEX
AFBFQ
AFDTB
AFEXH
AFMBP
AFMFG
AFNMH
AFSOK
AFUWQ
AGINI
AHOMT
AHQNM
AHQVU
AHVBC
AHXIK
AIJEX
AINUH
AJCLO
AJIOK
AJNWD
AJRGT
AJZMW
AKCTQ
AKULP
ALKUP
ALMA_UNASSIGNED_HOLDINGS
ALMTX
AMJPA
AMKUR
AMNEI
AOHHW
AOQMC
BAWUL
BOYCO
BQLVK
C45
CGR
CS3
CUY
CVF
DIWNM
E.X
E3Z
EBS
ECM
EEVPB
EIF
ERAAH
EX3
F2K
F2L
F2M
F2N
F5P
FCALG
FL-
FRP
GNXGY
GQDEL
GX1
H0~
HLJTE
HZ~
IKREB
IKYAY
IN~
IPNFZ
JF9
JG8
JK3
JK8
K8S
KD2
KMI
L-C
L7B
MZP
N9A
NPM
N~7
N~B
O9-
OAG
OAH
OB4
ODMTH
OHYEH
OK1
OL1
OLG
OLH
OLL
OLU
OLV
OLY
OLZ
OPUJH
OVD
OVDNE
OVIDH
OVLEI
OVOZU
OWBYB
OWU
OWV
OWW
OWX
OWY
OWZ
OXXIT
P2P
PONUX
RIG
RLZ
S4R
S4S
TEORI
TR2
TSPGW
V2I
VVN
W3M
W8F
WOQ
WOW
X3V
X3W
XXN
XYM
YFH
YOC
ZFV
7X8
AAFWJ
ABPXF
ACBKD
ADKSD
ADSXY
ID FETCH-LOGICAL-c3525-dfe5da3b58f82bf1c0e11efde62baa4741e5313ab1e6d1f622aa302bf95c9d4f2
IEDL.DBID 7X8
ISICitedReferencesCount 20
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=00000539-202106000-00015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1526-7598
IngestDate Sun Nov 09 11:17:08 EST 2025
Thu Apr 03 06:56:59 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License Copyright © 2021 International Anesthesia Research Society.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3525-dfe5da3b58f82bf1c0e11efde62baa4741e5313ab1e6d1f622aa302bf95c9d4f2
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PMID 33464759
PQID 2479040172
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2479040172
pubmed_primary_33464759
PublicationCentury 2000
PublicationDate 2021-06-01
20210601
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Anesthesia and analgesia
PublicationTitleAlternate Anesth Analg
PublicationYear 2021
SSID ssj0001086
Score 2.455808
Snippet Researchers often convert prediction tools built on statistical regression models into integer scores and risk classification systems in the name of...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1603
SubjectTerms Area Under Curve
Computer Simulation - statistics & numerical data
Computer Simulation - trends
Forecasting
Humans
Models, Statistical
Regression Analysis
ROC Curve
Stroke - diagnosis
Stroke - epidemiology
Title Developing a Clinical Prediction Score: Comparing Prediction Accuracy of Integer Scores to Statistical Regression Models
URI https://www.ncbi.nlm.nih.gov/pubmed/33464759
https://www.proquest.com/docview/2479040172
Volume 132
WOSCitedRecordID wos00000539-202106000-00015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrDwEK_ykpFYrTZOnAcLqkorBogqXupWOX6wVElJKIJ_z13iUhYkJDJkyTmK7s7ny539fYRceOAjmivOIDhGLIhiyTLBDVOQPuhA2CypyWCeb6M0jcfjZOQKbpXbVrmIiXWg1oXCGnmHB1ECDgfr7dXslSFrFHZXHYXGKmn5kMqgV0fjJVo4sgjVeKk8ZJFIYnd0jnt-p5cOGuhCdwkf-XJ-SzLrxWa49d_P3CabLs2kvcYvdsiKyXfJx_X3CSkqqUMEndJRic0aNBB9QFDLS9pvyAlB7MeznlLzUqpPWliKdcQXUzbyFX0rKCatNeYzvPDevDS7a3OKVGvTao88DQeP_RvmmBeYQnhUpq0RWvqZiG3MM-uprvE8Y7UJeSZlAFmIgbnry8wzofZsyLmUfhckE6ESHVi-T9byIjeHhEaICai1DcPYgPk1_EGJKNAWYWWkVLxNzheKnIBnY7tC5qaYV5OlKtvkoLHGZNZAcEx8PwgRqfDoD6OPyQbHjSh16eSEtCzMa3NK1tU76KU8q10G7uno7gu6PMvO
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Developing+a+Clinical+Prediction+Score%3A+Comparing+Prediction+Accuracy+of+Integer+Scores+to+Statistical+Regression+Models&rft.jtitle=Anesthesia+and+analgesia&rft.au=Subramanian%2C+Vigneshwar&rft.au=Mascha%2C+Edward+J&rft.au=Kattan%2C+Michael+W&rft.date=2021-06-01&rft.eissn=1526-7598&rft.volume=132&rft.issue=6&rft.spage=1603&rft_id=info:doi/10.1213%2FANE.0000000000005362&rft_id=info%3Apmid%2F33464759&rft_id=info%3Apmid%2F33464759&rft.externalDocID=33464759
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1526-7598&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1526-7598&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1526-7598&client=summon