T72. AUTOENCODER BASED CORRECTION OF POPULATION STRATIFICATION PER INDIVIDUAL SNP'S
Population stratification occurs when sub-populations exhibit distinct allele frequencies due to geographic separation and limited genetic exchange. However, this poses challenges in genetic analysis as allelic differences can confound the detection of causal links between genetic variants and pheno...
Uloženo v:
| Vydáno v: | European neuropsychopharmacology Ročník 75; s. S200 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.10.2023
|
| ISSN: | 0924-977X, 1873-7862 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Population stratification occurs when sub-populations exhibit distinct allele frequencies due to geographic separation and limited genetic exchange. However, this poses challenges in genetic analysis as allelic differences can confound the detection of causal links between genetic variants and phenotypic traits. To address this, Principal Component Analysis (PCA) is commonly used to infer population stratification and adjust for its impact in genome-wide association studies (GWAS). Nevertheless, the performance of PCA decreases when analyzing rare variants, resulting in their exclusion from analysis. Paradoxically, rare variants hold valuable information about hidden population substructures not captured by reference panels. This highlights the importance of considering rare variants in accurately correcting spurious associations induced by these variants.
For our correction we are using autoencoders, a type of neural network, which consist of an encoder that compresses input data into a lower dimensionnality representation and a decoder that reconstructs the original data. We then cluster the single nucleotide polymorphisms (SNPs) based on similarity. A k-means approach is used to identify SNP clusters, potentially capturing linkage disequilibrium (LD) implicitly. These clusters may contain important information about unlabeled population substructures that traditional methods like PCA might miss, impacting population stratification bias. Empirical evidence, including a visualization of allele frequency differences per ancestry, supports the notion that lower-dimensional representations of similar SNPs reveal hidden population structure not evident through conventional analysis.
The SNP clusters are then used to create an allele frequency landscape which can subsequently be used as correction for population stratification in GWA studies.
Through extensive simulations, we consistently observed that our model outperformed classical methods in effectively handling rare variants. Notably, our model demonstrated comparable performance to classical methods when dealing with very common variants. These findings highlight the superiority of our model in accurately capturing and correcting for population stratification associated with rare variants. The simulations provided valuable insights into the robustness and effectiveness of our model, supporting its potential as a reliable tool for population stratification correction in genetic studies.
If the basic quality control steps involve filtering out rare variants, classical correction methods like PCA may be more efficient due to the substantial computational requirements of our model. Nevertheless, our results highlight the potential and value of our model in overcoming the limitations of classical methods when dealing with rare variants, leading to more accurate genetic structure representation and reliable identification of phenotype-associated genetic variants. Further research should focus on optimizing the computational efficiency of our model and identifying specific scenarios where it provides the greatest benefits. Overall, our study contributes to advancing population stratification analysis and underscores the potential of our model in improving the accuracy of genetic analyses involving rare variants. |
|---|---|
| ISSN: | 0924-977X 1873-7862 |
| DOI: | 10.1016/j.euroneuro.2023.08.356 |