Adjusting for principal components can induce collider bias in genome-wide association studies

Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PLoS genetics Ročník 20; číslo 12; s. e1011242
Hlavní autoři: Grinde, Kelsey E., Browning, Brian L., Reiner, Alexander P., Thornton, Timothy A., Browning, Sharon R.
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Public Library of Science 16.12.2024
Public Library of Science (PLoS)
Témata:
ISSN:1553-7404, 1553-7390, 1553-7404
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women’s Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.
Bibliografie:new_version
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
I have read the journal’s policy and the authors of this manuscript have the following competing interests: T.A.T. is a current employee of Regeneron Genetics Center and stockholder of Regeneron Pharmaceuticals. The other authors have no competing interests to declare.
ISSN:1553-7404
1553-7390
1553-7404
DOI:10.1371/journal.pgen.1011242