How to manage missing covariates in randomized controlled trials: a comparison of strategies.
Saved in:
| Title: | How to manage missing covariates in randomized controlled trials: a comparison of strategies. |
|---|---|
| Authors: | Zhang S; Institute for Social Research, University of Michigan, 426 Thompson St, 48104, Ann Arbor, MI, US. zsy@umich.edu., Si Y; Institute for Social Research, University of Michigan, 426 Thompson St, 48104, Ann Arbor, MI, US., Dziak JJ; Institute for Social Research, University of Michigan, 426 Thompson St, 48104, Ann Arbor, MI, US. |
| Source: | BMC medical research methodology [BMC Med Res Methodol] 2025 Nov 25; Vol. 25 (1), pp. 264. Date of Electronic Publication: 2025 Nov 25. |
| Publication Type: | Journal Article; Comparative Study |
| Language: | English |
| Journal Info: | Publisher: BioMed Central Country of Publication: England NLM ID: 100968545 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2288 (Electronic) Linking ISSN: 14712288 NLM ISO Abbreviation: BMC Med Res Methodol Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: London : BioMed Central, [2001- |
| MeSH Terms: | Randomized Controlled Trials as Topic*/methods , Randomized Controlled Trials as Topic*/statistics & numerical data , Research Design*, Humans ; Data Interpretation, Statistical ; Models, Statistical ; Bias ; Computer Simulation |
| Abstract: | Competing Interests: Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests. Background: When analyzing randomized controlled trials (RCTs) data, covariate adjustment is often employed to increase the precision of estimated treatment effects. Missing data in covariates, if not handled properly, can result in biased and inefficient estimates. However, the existing literature on handling missing covariate data is limited, and recommendations vary regarding a valid and efficient approach. Methods: To help reconcile the seemingly inconsistent recommendations, we address two questions through methodological descriptions and simulated demonstrations. First, how should a multiple imputation (MI) model be specified for RCTs to best preserve the benefit of the randomization design? We consider three different approaches: MI with only baseline variables, "MI overall", and "MI by arm". Second, when and why will simple general strategies, such as grand mean imputation and the missing indicator method, perform as well as or better than MI in estimating treatment effects, and when and why do they fail? Results: "MI by arm" has the potential to produce unbiased estimates for both the average and subgroup treatment effect (primary and secondary analyses) under the missing at random assumption. Strategies that capitalize on the randomization design, including MI with baseline variables, grand mean imputation, and the missing indicator method, may generate unbiased estimates for the average treatment effect (primary analysis) regardless of the missing data mechanism. Conclusion: This article clarifies the assumptions and mechanisms by which different missing data strategies accommodate missingness in covariates and reconcile recommendations that sometimes appear contradictory in the literature. Under MAR, "MI by arm" produces unbiased estimates for both the average treatment effect and subgroup treatment effects. Leveraging the randomization design, "baseline-only MI", grand mean imputation, and the missing indicator method produce unbiased estimates for the average treatment effect, but biased subgroup treatment effects, regardless of the missing data mechanism. (© 2025. The Author(s).) |
| References: | Psychol Methods. 2002 Jun;7(2):147-77. (PMID: 12090408) Stat Med. 2002 Oct 15;21(19):2917-30. (PMID: 12325108) Annu Rev Psychol. 2009;60:549-76. (PMID: 18652544) Stat Methods Med Res. 2015 Aug;24(4):462-87. (PMID: 24525487) Am J Epidemiol. 2010 Mar 1;171(5):624-32. (PMID: 20106935) J Surv Stat Methodol. 2021 Oct 19;11(1):260-283. (PMID: 36714298) Yale J Biol Med. 2013 Sep 20;86(3):343-58. (PMID: 24058309) Stat Methods Med Res. 2007 Jun;16(3):219-42. (PMID: 17621469) Am J Epidemiol. 1995 Dec 15;142(12):1255-64. (PMID: 7503045) Stat Med. 2005 Apr 15;24(7):993-1007. (PMID: 15570623) Ann Appl Stat. 2020 Dec;14(4):1903-1924. (PMID: 36303710) Multivariate Behav Res. 2020 Nov-Dec;55(6):926-940. (PMID: 31795755) Int J Epidemiol. 2019 Aug 1;48(4):1294-1304. (PMID: 30879056) Stat Med. 2007 Jan 15;26(1):20-36. (PMID: 17072897) J Clin Epidemiol. 2010 Jul;63(7):721-7. (PMID: 20338724) N Engl J Med. 2012 Oct 4;367(14):1355-60. (PMID: 23034025) Behav Res Methods. 2022 Dec;54(6):2962-2980. (PMID: 35138552) J Biopharm Stat. 2022 Sep 3;32(5):717-739. (PMID: 35041565) Pharm Stat. 2020 Nov;19(6):840-860. (PMID: 32510791) JAMA. 2021 Sep 14;326(10):967-968. (PMID: 34424281) Trials. 2014 Apr 23;15:139. (PMID: 24755011) Stat Methods Med Res. 2018 Sep;27(9):2610-2626. (PMID: 28034175) Stat Med. 2010 Dec 10;29(28):2920-31. (PMID: 20842622) Multivariate Behav Res. 1998 Oct 1;33(4):545-71. (PMID: 26753828) J Clin Epidemiol. 2004 May;57(5):454-60. (PMID: 15196615) Psychol Methods. 2025 Apr;30(2):322-339. (PMID: 36931827) Clin Trials. 2024 Aug;21(4):399-411. (PMID: 38825841) Trials. 2022 Apr 18;23(1):328. (PMID: 35436970) CMAJ. 2012 Aug 7;184(11):1265-9. (PMID: 22371511) Stat Methods Med Res. 2007 Jun;16(3):199-218. (PMID: 17621468) |
| Contributed Indexing: | Keywords: Grand mean imputation; Missing covariates; Missing indicator method; Multiple imputation; Randomized controlled trials |
| Entry Date(s): | Date Created: 20251126 Date Completed: 20251126 Latest Revision: 20251128 |
| Update Code: | 20251128 |
| PubMed Central ID: | PMC12649034 |
| DOI: | 10.1186/s12874-025-02708-w |
| PMID: | 41291471 |
| Database: | MEDLINE |
| Abstract: | Competing Interests: Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.<br />Background: When analyzing randomized controlled trials (RCTs) data, covariate adjustment is often employed to increase the precision of estimated treatment effects. Missing data in covariates, if not handled properly, can result in biased and inefficient estimates. However, the existing literature on handling missing covariate data is limited, and recommendations vary regarding a valid and efficient approach.<br />Methods: To help reconcile the seemingly inconsistent recommendations, we address two questions through methodological descriptions and simulated demonstrations. First, how should a multiple imputation (MI) model be specified for RCTs to best preserve the benefit of the randomization design? We consider three different approaches: MI with only baseline variables, "MI overall", and "MI by arm". Second, when and why will simple general strategies, such as grand mean imputation and the missing indicator method, perform as well as or better than MI in estimating treatment effects, and when and why do they fail?<br />Results: "MI by arm" has the potential to produce unbiased estimates for both the average and subgroup treatment effect (primary and secondary analyses) under the missing at random assumption. Strategies that capitalize on the randomization design, including MI with baseline variables, grand mean imputation, and the missing indicator method, may generate unbiased estimates for the average treatment effect (primary analysis) regardless of the missing data mechanism.<br />Conclusion: This article clarifies the assumptions and mechanisms by which different missing data strategies accommodate missingness in covariates and reconcile recommendations that sometimes appear contradictory in the literature. Under MAR, "MI by arm" produces unbiased estimates for both the average treatment effect and subgroup treatment effects. Leveraging the randomization design, "baseline-only MI", grand mean imputation, and the missing indicator method produce unbiased estimates for the average treatment effect, but biased subgroup treatment effects, regardless of the missing data mechanism.<br /> (© 2025. The Author(s).) |
|---|---|
| ISSN: | 1471-2288 |
| DOI: | 10.1186/s12874-025-02708-w |
Full Text Finder
Nájsť tento článok vo Web of Science