Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type...

Full description

Saved in:

Bibliographic Details
Published in:	Nature genetics Vol. 50; no. 9; pp. 1335 - 1341
Main Authors:	Zhou, Wei, Nielsen, Jonas B., Fritsche, Lars G., Dey, Rounak, Gabrielsen, Maiken E., Wolford, Brooke N., LeFaive, Jonathon, VandeHaar, Peter, Gagliano, Sarah A., Gifford, Aliya, Bastarache, Lisa A., Wei, Wei-Qi, Denny, Joshua C., Lin, Maoxuan, Hveem, Kristian, Kang, Hyun Min, Abecasis, Goncalo R., Willer, Cristen J., Lee, Seunggeun
Format:	Journal Article
Language:	English
Published:	New York Nature Publishing Group US 01.09.2018 Nature Publishing Group
Subjects:	45 45/43 631/208/205/2138 639/705/531 Agriculture Analysis Animal Genetics and Genomics Biomedical and Life Sciences Biomedicine Cancer Research Case-Control Studies Computer applications Computer Simulation Costs Data processing Electronic health records Gene Function Genome-wide association studies Genome-Wide Association Study - methods Genomes Human Genetics Humans Linear Models Logistic Models Methods Model testing Models, Genetic Normal distribution Parameter estimation Phenotype Phenotypes Polymorphism, Single Nucleotide Quantitative genetics Statistical analysis Statistical methods Statistical tests United Kingdom > UK
ISSN:	1061-4036, 1546-1718, 1546-1718
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness. SAIGE (Scalable and Accurate Implementation of GEneralized mixed model) is a generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 These authors contributed equally to this work
ISSN:	1061-4036 1546-1718 1546-1718
DOI:	10.1038/s41588-018-0184-y