Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations

Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Communications biology Jg. 5; H. 1; S. 856 - 12
Hauptverfasser:	Elgart, Michael, Lyons, Genevieve, Romero-Brufau, Santiago, Kurniansyah, Nuzulul, Brody, Jennifer A., Guo, Xiuqing, Lin, Henry J., Raffield, Laura, Gao, Yan, Chen, Han, de Vries, Paul, Lloyd-Jones, Donald M., Lange, Leslie A., Peloso, Gina M., Fornage, Myriam, Rotter, Jerome I., Rich, Stephen S., Morrison, Alanna C., Psaty, Bruce M., Levy, Daniel, Redline, Susan, Sofer, Tamar
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	London Nature Publishing Group UK 22.08.2022 Nature Publishing Group Nature Portfolio
Schlagworte:	45/43 631/114/1305 631/208/205/2138 Biology Biomedical and Life Sciences Blood pressure Body mass index Cholesterol Genetic Predisposition to Disease Genome-Wide Association Study High density lipoprotein Humans Learning algorithms Life Sciences Low density lipoprotein Machine Learning Minority & ethnic groups Multifactorial Inheritance Phenotypes Polygenic inheritance Polymorphism, Single Nucleotide Prediction models Single-nucleotide polymorphism Triglycerides
ISSN:	2399-3642, 2399-3642
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models. Combining a standard polygenic risk score (PRS) as a feature in a machine learning model increases the percentage variance explained for those traits, helping to account for non-linearities or interaction effects in genetics-based prediction models.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2399-3642 2399-3642
DOI:	10.1038/s42003-022-03812-z