Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies
Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of c...
Gespeichert in:
| Veröffentlicht in: | American journal of human genetics Jg. 94; H. 5; S. 662 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
01.05.2014
|
| Schlagworte: | |
| ISSN: | 1537-6605, 1537-6605 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach. |
|---|---|
| AbstractList | Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach. Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach. |
| Author | Vilhjálmsson, Bjarni J Trégouët, David-Alexandre Greliche, Nicolas Aschard, Hugues Morange, Pierre-Emmanuel Kraft, Peter |
| Author_xml | – sequence: 1 givenname: Hugues surname: Aschard fullname: Aschard, Hugues email: haschard@hsph.harvard.edu organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA. Electronic address: haschard@hsph.harvard.edu – sequence: 2 givenname: Bjarni J surname: Vilhjálmsson fullname: Vilhjálmsson, Bjarni J organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA – sequence: 3 givenname: Nicolas surname: Greliche fullname: Greliche, Nicolas organization: Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France – sequence: 4 givenname: Pierre-Emmanuel surname: Morange fullname: Morange, Pierre-Emmanuel organization: Aix-Marseille Université, INSERM UMR_S 1062, 13385 Marseille, France – sequence: 5 givenname: David-Alexandre surname: Trégouët fullname: Trégouët, David-Alexandre organization: Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France – sequence: 6 givenname: Peter surname: Kraft fullname: Kraft, Peter organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/24746957$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNUMtOwzAQtBCIPuAHOCAfuSTYTuykR1Txkoq4wLnyY9O6SmwTOyrl6wmiSJx2Z3Y0o9kZOnXeAUJXlOSUUHG7y-Vuu8kZoWVOinykTtCU8qLKhCD89N8-QbMYd4RQWpPiHE1YWZViwaspCi_y03b2y7oNTlvAwe-hx77BobdO2yDbTPsujMEuYelke4g2_ty173toZQKDwxacT4cAEVuHNyPoINtbA1jG6LWVyXqHYxqMhXiBzhrZRrg8zjl6f7h_Wz5lq9fH5-XdKtOclynj5aJqQNZMUlJqVimjpNCKKOBNw6FYMKlrYbhRWgtRq0YqRivBS6FAKWHYHN38-obefwwQ07qzUUPbSgd-iGvKGS2YqAUZpddH6aA6MOuxeif7w_rvS-wbjEhwUw |
| CitedBy_id | crossref_primary_10_1016_j_ajhg_2014_12_021 crossref_primary_10_1002_gepi_22535 crossref_primary_10_1016_j_ecoenv_2023_115665 crossref_primary_10_1016_j_ajhg_2025_05_015 crossref_primary_10_3389_fgene_2021_627989 crossref_primary_10_1016_j_bone_2016_05_009 crossref_primary_10_3389_fgene_2022_791920 crossref_primary_10_1515_ijb_2022_0010 crossref_primary_10_1097_j_pain_0000000000001996 crossref_primary_10_1093_bioadv_vbae135 crossref_primary_10_1002_gepi_22124 crossref_primary_10_1002_gepi_22128 crossref_primary_10_1371_journal_pone_0181875 crossref_primary_10_3390_ani8120239 crossref_primary_10_1002_gepi_21957 crossref_primary_10_1534_genetics_117_300287 crossref_primary_10_1038_srep34323 crossref_primary_10_3390_pathogens12060779 crossref_primary_10_1111_ahg_12149 crossref_primary_10_1038_s41588_024_01831_6 crossref_primary_10_1371_journal_pgen_1011245 crossref_primary_10_1038_s41746_021_00488_3 crossref_primary_10_1111_ahg_12260 crossref_primary_10_1093_pcp_pcaa039 crossref_primary_10_1016_j_livsci_2017_01_012 crossref_primary_10_1186_s12864_019_6192_1 crossref_primary_10_1002_gepi_22355 crossref_primary_10_1002_gepi_22513 crossref_primary_10_1038_s41431_021_00911_z crossref_primary_10_1038_s43587_022_00248_2 crossref_primary_10_1186_s12864_016_3169_1 crossref_primary_10_1111_bjd_18700 crossref_primary_10_1002_sim_8111 crossref_primary_10_1038_s41598_017_09788_9 crossref_primary_10_1371_journal_pgen_1009713 crossref_primary_10_1016_j_numecd_2022_03_010 crossref_primary_10_1016_j_ajcnut_2024_05_016 crossref_primary_10_1038_s41467_022_30110_3 crossref_primary_10_1038_s41598_020_62024_9 crossref_primary_10_1016_j_ygeno_2018_07_010 crossref_primary_10_1371_journal_pone_0140348 crossref_primary_10_1038_s41588_018_0320_8 crossref_primary_10_1002_gepi_21931 crossref_primary_10_1038_s41588_024_01884_7 crossref_primary_10_1038_nmeth_3439 crossref_primary_10_1038_s41431_023_01389_7 crossref_primary_10_1186_s12864_017_3928_7 crossref_primary_10_1002_gepi_21937 crossref_primary_10_1111_biom_12751 crossref_primary_10_1371_journal_pgen_1008973 crossref_primary_10_1360_SSM_2024_0309 crossref_primary_10_1016_j_ajhg_2014_11_011 crossref_primary_10_1002_sta4_102 crossref_primary_10_1186_s13073_024_01329_0 crossref_primary_10_1016_j_ajhg_2022_02_013 crossref_primary_10_1016_j_inffus_2024_102738 crossref_primary_10_1534_genetics_115_186502 crossref_primary_10_1007_s43657_020_00005_8 crossref_primary_10_1002_cjce_24213 crossref_primary_10_1093_g3journal_jkaa053 crossref_primary_10_1093_biostatistics_kxaa049 crossref_primary_10_1139_cjps_2024_0195 crossref_primary_10_1371_journal_pcbi_1005788 crossref_primary_10_1534_g3_119_400228 crossref_primary_10_1007_s10255_022_1019_2 crossref_primary_10_1038_s41598_021_93154_3 crossref_primary_10_1038_s41431_019_0545_8 crossref_primary_10_1002_bimj_202300130 crossref_primary_10_1038_s41380_022_01437_6 crossref_primary_10_1186_s13073_023_01233_z crossref_primary_10_1155_2021_8812282 crossref_primary_10_1159_000381641 crossref_primary_10_1007_s11250_021_02815_y crossref_primary_10_1038_srep38837 crossref_primary_10_1017_S0016672317000052 crossref_primary_10_1093_bib_bbw111 crossref_primary_10_1002_gepi_22330 crossref_primary_10_1038_ncomms13357 crossref_primary_10_1534_genetics_118_301394 crossref_primary_10_1371_journal_pone_0190788 crossref_primary_10_1111_biom_12735 crossref_primary_10_1080_01621459_2018_1513363 crossref_primary_10_1159_000446239 crossref_primary_10_1016_j_jaci_2022_05_017 crossref_primary_10_1038_s41598_018_37538_y crossref_primary_10_1371_journal_pgen_1005965 crossref_primary_10_1038_ng_3975 crossref_primary_10_1186_s12863_015_0194_z crossref_primary_10_1007_s12041_018_0885_0 crossref_primary_10_3389_fgene_2021_654804 crossref_primary_10_1007_s13258_020_01034_3 crossref_primary_10_3390_ijms24076343 crossref_primary_10_1002_gepi_22156 crossref_primary_10_1111_acel_12490 crossref_primary_10_1093_g3journal_jkae228 crossref_primary_10_1371_journal_pgen_1011718 crossref_primary_10_1002_gepi_22439 crossref_primary_10_1534_genetics_116_199646 crossref_primary_10_1002_oby_24291 crossref_primary_10_1186_s12863_018_0649_0 crossref_primary_10_1093_nargab_lqaa003 crossref_primary_10_1371_journal_pone_0150975 crossref_primary_10_1371_journal_pone_0201186 crossref_primary_10_1111_ppl_14507 crossref_primary_10_1186_s13073_018_0521_x crossref_primary_10_1002_gepi_22033 crossref_primary_10_1038_srep26243 crossref_primary_10_1371_journal_pone_0167187 crossref_primary_10_1002_gepi_22265 crossref_primary_10_1186_s12864_023_09594_w crossref_primary_10_17221_86_2024_RAE crossref_primary_10_1038_s41598_019_44046_0 crossref_primary_10_1016_j_ymgme_2015_10_008 crossref_primary_10_1016_j_jplph_2022_153784 crossref_primary_10_1371_journal_pgen_1008773 crossref_primary_10_1371_journal_pone_0260911 crossref_primary_10_1159_000381851 crossref_primary_10_1534_genetics_116_189712 crossref_primary_10_1002_gepi_22263 crossref_primary_10_1111_tpj_14097 crossref_primary_10_1098_rsob_170125 |
| ContentType | Journal Article |
| Copyright | Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved. |
| Copyright_xml | – notice: Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1016/j.ajhg.2014.03.016 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1537-6605 |
| ExternalDocumentID | 24746957 |
| Genre | Research Support, Non-U.S. Gov't Evaluation Study Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: R03HG006720 – fundername: NCI NIH HHS grantid: R35 CA197449 – fundername: NCI NIH HHS grantid: R21 CA165920 – fundername: NCI NIH HHS grantid: P01 CA134294 – fundername: NCI NIH HHS grantid: R21CA165920 – fundername: NHGRI NIH HHS grantid: R03 HG006720 |
| GroupedDBID | --- --K --Z -~X 0R~ 123 1~5 23M 2WC 34R 4.4 457 4G. 53G 5GY 62- 6J9 7-5 85S AAEDT AAEDW AAIKJ AAKRW AALRI AAMRU AAVLU AAWTL AAXUO ABDGV ABJNI ABMAC ABOCM ACGFO ACGFS ACGOD ACNCT ACPRK ADBBV ADEZE ADVLN AENEX AEXQZ AFRAH AFTJW AGCQF AGHFR AGKMS AHMBA AITUG AKAPO AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ AOIJS APXCP ASPBG AVWKF AZFZN BAWUL CGR CS3 CUY CVF D0L DIK E3Z EBS ECM ECV EIF EJD F5P FCP FDB FEDTE GX1 HVGLF HYE IH2 IHE IXB JIG KQ8 L7B M41 NPM O-L O9- OK1 P2P PQQKQ RIG RNS ROL RPM RPZ SES SJN SSZ TN5 TR2 TWZ UHB UKR UNMZH UPT WH7 ZCA 7X8 AAFWJ ABUFD ACVFH ADCNI AEUPX AFPUW AIGII AKBMS AKYEP EFKBS |
| ID | FETCH-LOGICAL-c554t-5497fea82a104c27bdba6cb0be5ff5e392ac86d5dbcc668bfab2176546bebb6d2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 134 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000335485700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1537-6605 |
| IngestDate | Sun Nov 09 10:04:45 EST 2025 Thu May 29 04:59:31 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| License | Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c554t-5497fea82a104c27bdba6cb0be5ff5e392ac86d5dbcc668bfab2176546bebb6d2 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 |
| OpenAccessLink | http://www.cell.com/article/S0002929714001189/pdf |
| PMID | 24746957 |
| PQID | 1521326860 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1521326860 pubmed_primary_24746957 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-May-01 20140501 |
| PublicationDateYYYYMMDD | 2014-05-01 |
| PublicationDate_xml | – month: 05 year: 2014 text: 2014-May-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | American journal of human genetics |
| PublicationTitleAlternate | Am J Hum Genet |
| PublicationYear | 2014 |
| SSID | ssj0011803 |
| Score | 2.5102983 |
| Snippet | Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 662 |
| SubjectTerms | Genome-Wide Association Study - statistics & numerical data Humans Models, Genetic Phenotype Polymorphism, Single Nucleotide Principal Component Analysis - methods Quantitative Trait, Heritable Venous Thrombosis - genetics |
| Title | Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/24746957 https://www.proquest.com/docview/1521326860 |
| Volume | 94 |
| WOSCitedRecordID | wos000335485700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qKnjx_X4QwWtw222T7klEFC-7eFDY25JMEq24bbW-f70zTWVPguCll9ISksnMN5Mv8zF2DKmzkYpA9CUANdWWwmCWIXzsE-W6se1nQWxCDYfZaNS_bgtudUur_PGJjaO2JVCN_ITiDEKNTHZPqydBqlF0utpKaMyyTg-hDFm1Gk1PEaKsUUbGTa2ERNzeXpoJ_C79cH9H1K6kaXIayd8hZhNqLpf_O8gVttSCTH4WrGKVzbhijS0E2cnPdVYN9Ec-yb8wanHEf7wipTReel6Fyrt-FMQ0LwsMSFy3XUvoPZCUxyOiU8uJGlZS_bbmecGp0-vEiffcOq6nC87rwFLcYLeXFzfnV6JVXhCA8OJFYNKovNNZrDFbg1gZa7QE0zUu9T51iKk0ZNKm1gBImRmvDaY2dC_KOGOkjTfZXIGj3Gbc2yTWseuZXqSTHiA8igD_nBj0FpAp2GFHP1M5Rsum4wpduPK1Hk8nc4dthfUYV6EFxzhOFOb1qdr9w9d7bJGWObAU91nH4752B2we3l7y-vmwMRl8Dq8H34qPzvQ |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Maximizing+the+power+of+principal-component+analysis+of+correlated+phenotypes+in+genome-wide+association+studies&rft.jtitle=American+journal+of+human+genetics&rft.au=Aschard%2C+Hugues&rft.au=Vilhj%C3%A1lmsson%2C+Bjarni+J&rft.au=Greliche%2C+Nicolas&rft.au=Morange%2C+Pierre-Emmanuel&rft.date=2014-05-01&rft.issn=1537-6605&rft.eissn=1537-6605&rft.volume=94&rft.issue=5&rft.spage=662&rft_id=info:doi/10.1016%2Fj.ajhg.2014.03.016&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1537-6605&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1537-6605&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1537-6605&client=summon |