Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies

False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises t...

Full description

Saved in:

Bibliographic Details
Published in:	PLoS genetics Vol. 12; no. 2; p. e1005767
Main Authors:	Liu, Xiaolei, Huang, Meng, Fan, Bin, Buckler, Edward S., Zhang, Zhiwu
Format:	Journal Article
Language:	English
Published:	United States Public Library of Science 01.02.2016 Public Library of Science (PLoS)
Subjects:	Arabidopsis - genetics Big Data Biology and Life Sciences Datasets Flowers - genetics Flowers - physiology Genes, Plant Genetic Loci Genome-wide association studies Genome-Wide Association Study Genomes Genotype & phenotype Humans Linear models (Statistics) Lung cancer Medicine and Health Sciences Models, Genetic Physical Sciences Population Power Quantitative Trait, Heritable Research and Analysis Methods Software Species Specificity Statistical methods Studies
ISSN:	1553-7404, 1553-7390, 1553-7404
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Conceived and designed the experiments: ZZ BF ESB. Performed the experiments: XL. Analyzed the data: XL MH. Contributed reagents/materials/analysis tools: XL ZZ. Wrote the paper: ZZ XL. Supervised the design of the study: ZZ BF ESB. The authors have declared that no competing interests exist.
ISSN:	1553-7404 1553-7390 1553-7404
DOI:	10.1371/journal.pgen.1005767