Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies

Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:American journal of human genetics Jg. 94; H. 5; S. 662
Hauptverfasser: Aschard, Hugues, Vilhjálmsson, Bjarni J, Greliche, Nicolas, Morange, Pierre-Emmanuel, Trégouët, David-Alexandre, Kraft, Peter
Format: Journal Article
Sprache:Englisch
Veröffentlicht: United States 01.05.2014
Schlagworte:
ISSN:1537-6605, 1537-6605
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.
AbstractList Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.
Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.
Author Vilhjálmsson, Bjarni J
Trégouët, David-Alexandre
Greliche, Nicolas
Aschard, Hugues
Morange, Pierre-Emmanuel
Kraft, Peter
Author_xml – sequence: 1
  givenname: Hugues
  surname: Aschard
  fullname: Aschard, Hugues
  email: haschard@hsph.harvard.edu
  organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA. Electronic address: haschard@hsph.harvard.edu
– sequence: 2
  givenname: Bjarni J
  surname: Vilhjálmsson
  fullname: Vilhjálmsson, Bjarni J
  organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA
– sequence: 3
  givenname: Nicolas
  surname: Greliche
  fullname: Greliche, Nicolas
  organization: Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France
– sequence: 4
  givenname: Pierre-Emmanuel
  surname: Morange
  fullname: Morange, Pierre-Emmanuel
  organization: Aix-Marseille Université, INSERM UMR_S 1062, 13385 Marseille, France
– sequence: 5
  givenname: David-Alexandre
  surname: Trégouët
  fullname: Trégouët, David-Alexandre
  organization: Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France
– sequence: 6
  givenname: Peter
  surname: Kraft
  fullname: Kraft, Peter
  organization: Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24746957$$D View this record in MEDLINE/PubMed
BookMark eNpNUMtOwzAQtBCIPuAHOCAfuSTYTuykR1Txkoq4wLnyY9O6SmwTOyrl6wmiSJx2Z3Y0o9kZOnXeAUJXlOSUUHG7y-Vuu8kZoWVOinykTtCU8qLKhCD89N8-QbMYd4RQWpPiHE1YWZViwaspCi_y03b2y7oNTlvAwe-hx77BobdO2yDbTPsujMEuYelke4g2_ty173toZQKDwxacT4cAEVuHNyPoINtbA1jG6LWVyXqHYxqMhXiBzhrZRrg8zjl6f7h_Wz5lq9fH5-XdKtOclynj5aJqQNZMUlJqVimjpNCKKOBNw6FYMKlrYbhRWgtRq0YqRivBS6FAKWHYHN38-obefwwQ07qzUUPbSgd-iGvKGS2YqAUZpddH6aA6MOuxeif7w_rvS-wbjEhwUw
CitedBy_id crossref_primary_10_1016_j_ajhg_2014_12_021
crossref_primary_10_1002_gepi_22535
crossref_primary_10_1016_j_ecoenv_2023_115665
crossref_primary_10_1016_j_ajhg_2025_05_015
crossref_primary_10_3389_fgene_2021_627989
crossref_primary_10_1016_j_bone_2016_05_009
crossref_primary_10_3389_fgene_2022_791920
crossref_primary_10_1515_ijb_2022_0010
crossref_primary_10_1097_j_pain_0000000000001996
crossref_primary_10_1093_bioadv_vbae135
crossref_primary_10_1002_gepi_22124
crossref_primary_10_1002_gepi_22128
crossref_primary_10_1371_journal_pone_0181875
crossref_primary_10_3390_ani8120239
crossref_primary_10_1002_gepi_21957
crossref_primary_10_1534_genetics_117_300287
crossref_primary_10_1038_srep34323
crossref_primary_10_3390_pathogens12060779
crossref_primary_10_1111_ahg_12149
crossref_primary_10_1038_s41588_024_01831_6
crossref_primary_10_1371_journal_pgen_1011245
crossref_primary_10_1038_s41746_021_00488_3
crossref_primary_10_1111_ahg_12260
crossref_primary_10_1093_pcp_pcaa039
crossref_primary_10_1016_j_livsci_2017_01_012
crossref_primary_10_1186_s12864_019_6192_1
crossref_primary_10_1002_gepi_22355
crossref_primary_10_1002_gepi_22513
crossref_primary_10_1038_s41431_021_00911_z
crossref_primary_10_1038_s43587_022_00248_2
crossref_primary_10_1186_s12864_016_3169_1
crossref_primary_10_1111_bjd_18700
crossref_primary_10_1002_sim_8111
crossref_primary_10_1038_s41598_017_09788_9
crossref_primary_10_1371_journal_pgen_1009713
crossref_primary_10_1016_j_numecd_2022_03_010
crossref_primary_10_1016_j_ajcnut_2024_05_016
crossref_primary_10_1038_s41467_022_30110_3
crossref_primary_10_1038_s41598_020_62024_9
crossref_primary_10_1016_j_ygeno_2018_07_010
crossref_primary_10_1371_journal_pone_0140348
crossref_primary_10_1038_s41588_018_0320_8
crossref_primary_10_1002_gepi_21931
crossref_primary_10_1038_s41588_024_01884_7
crossref_primary_10_1038_nmeth_3439
crossref_primary_10_1038_s41431_023_01389_7
crossref_primary_10_1186_s12864_017_3928_7
crossref_primary_10_1002_gepi_21937
crossref_primary_10_1111_biom_12751
crossref_primary_10_1371_journal_pgen_1008973
crossref_primary_10_1360_SSM_2024_0309
crossref_primary_10_1016_j_ajhg_2014_11_011
crossref_primary_10_1002_sta4_102
crossref_primary_10_1186_s13073_024_01329_0
crossref_primary_10_1016_j_ajhg_2022_02_013
crossref_primary_10_1016_j_inffus_2024_102738
crossref_primary_10_1534_genetics_115_186502
crossref_primary_10_1007_s43657_020_00005_8
crossref_primary_10_1002_cjce_24213
crossref_primary_10_1093_g3journal_jkaa053
crossref_primary_10_1093_biostatistics_kxaa049
crossref_primary_10_1139_cjps_2024_0195
crossref_primary_10_1371_journal_pcbi_1005788
crossref_primary_10_1534_g3_119_400228
crossref_primary_10_1007_s10255_022_1019_2
crossref_primary_10_1038_s41598_021_93154_3
crossref_primary_10_1038_s41431_019_0545_8
crossref_primary_10_1002_bimj_202300130
crossref_primary_10_1038_s41380_022_01437_6
crossref_primary_10_1186_s13073_023_01233_z
crossref_primary_10_1155_2021_8812282
crossref_primary_10_1159_000381641
crossref_primary_10_1007_s11250_021_02815_y
crossref_primary_10_1038_srep38837
crossref_primary_10_1017_S0016672317000052
crossref_primary_10_1093_bib_bbw111
crossref_primary_10_1002_gepi_22330
crossref_primary_10_1038_ncomms13357
crossref_primary_10_1534_genetics_118_301394
crossref_primary_10_1371_journal_pone_0190788
crossref_primary_10_1111_biom_12735
crossref_primary_10_1080_01621459_2018_1513363
crossref_primary_10_1159_000446239
crossref_primary_10_1016_j_jaci_2022_05_017
crossref_primary_10_1038_s41598_018_37538_y
crossref_primary_10_1371_journal_pgen_1005965
crossref_primary_10_1038_ng_3975
crossref_primary_10_1186_s12863_015_0194_z
crossref_primary_10_1007_s12041_018_0885_0
crossref_primary_10_3389_fgene_2021_654804
crossref_primary_10_1007_s13258_020_01034_3
crossref_primary_10_3390_ijms24076343
crossref_primary_10_1002_gepi_22156
crossref_primary_10_1111_acel_12490
crossref_primary_10_1093_g3journal_jkae228
crossref_primary_10_1371_journal_pgen_1011718
crossref_primary_10_1002_gepi_22439
crossref_primary_10_1534_genetics_116_199646
crossref_primary_10_1002_oby_24291
crossref_primary_10_1186_s12863_018_0649_0
crossref_primary_10_1093_nargab_lqaa003
crossref_primary_10_1371_journal_pone_0150975
crossref_primary_10_1371_journal_pone_0201186
crossref_primary_10_1111_ppl_14507
crossref_primary_10_1186_s13073_018_0521_x
crossref_primary_10_1002_gepi_22033
crossref_primary_10_1038_srep26243
crossref_primary_10_1371_journal_pone_0167187
crossref_primary_10_1002_gepi_22265
crossref_primary_10_1186_s12864_023_09594_w
crossref_primary_10_17221_86_2024_RAE
crossref_primary_10_1038_s41598_019_44046_0
crossref_primary_10_1016_j_ymgme_2015_10_008
crossref_primary_10_1016_j_jplph_2022_153784
crossref_primary_10_1371_journal_pgen_1008773
crossref_primary_10_1371_journal_pone_0260911
crossref_primary_10_1159_000381851
crossref_primary_10_1534_genetics_116_189712
crossref_primary_10_1002_gepi_22263
crossref_primary_10_1111_tpj_14097
crossref_primary_10_1098_rsob_170125
ContentType Journal Article
Copyright Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Copyright_xml – notice: Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1016/j.ajhg.2014.03.016
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
EISSN 1537-6605
ExternalDocumentID 24746957
Genre Research Support, Non-U.S. Gov't
Evaluation Study
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R03HG006720
– fundername: NCI NIH HHS
  grantid: R35 CA197449
– fundername: NCI NIH HHS
  grantid: R21 CA165920
– fundername: NCI NIH HHS
  grantid: P01 CA134294
– fundername: NCI NIH HHS
  grantid: R21CA165920
– fundername: NHGRI NIH HHS
  grantid: R03 HG006720
GroupedDBID ---
--K
--Z
-~X
0R~
123
1~5
23M
2WC
34R
4.4
457
4G.
53G
5GY
62-
6J9
7-5
85S
AAEDT
AAEDW
AAIKJ
AAKRW
AALRI
AAMRU
AAVLU
AAWTL
AAXUO
ABDGV
ABJNI
ABMAC
ABOCM
ACGFO
ACGFS
ACGOD
ACNCT
ACPRK
ADBBV
ADEZE
ADVLN
AENEX
AEXQZ
AFRAH
AFTJW
AGCQF
AGHFR
AGKMS
AHMBA
AITUG
AKAPO
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
AOIJS
APXCP
ASPBG
AVWKF
AZFZN
BAWUL
CGR
CS3
CUY
CVF
D0L
DIK
E3Z
EBS
ECM
ECV
EIF
EJD
F5P
FCP
FDB
FEDTE
GX1
HVGLF
HYE
IH2
IHE
IXB
JIG
KQ8
L7B
M41
NPM
O-L
O9-
OK1
P2P
PQQKQ
RIG
RNS
ROL
RPM
RPZ
SES
SJN
SSZ
TN5
TR2
TWZ
UHB
UKR
UNMZH
UPT
WH7
ZCA
7X8
AAFWJ
ABUFD
ACVFH
ADCNI
AEUPX
AFPUW
AIGII
AKBMS
AKYEP
EFKBS
ID FETCH-LOGICAL-c554t-5497fea82a104c27bdba6cb0be5ff5e392ac86d5dbcc668bfab2176546bebb6d2
IEDL.DBID 7X8
ISICitedReferencesCount 134
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000335485700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1537-6605
IngestDate Sun Nov 09 10:04:45 EST 2025
Thu May 29 04:59:31 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
License Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c554t-5497fea82a104c27bdba6cb0be5ff5e392ac86d5dbcc668bfab2176546bebb6d2
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
OpenAccessLink http://www.cell.com/article/S0002929714001189/pdf
PMID 24746957
PQID 1521326860
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1521326860
pubmed_primary_24746957
PublicationCentury 2000
PublicationDate 2014-May-01
20140501
PublicationDateYYYYMMDD 2014-05-01
PublicationDate_xml – month: 05
  year: 2014
  text: 2014-May-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle American journal of human genetics
PublicationTitleAlternate Am J Hum Genet
PublicationYear 2014
SSID ssj0011803
Score 2.5102983
Snippet Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 662
SubjectTerms Genome-Wide Association Study - statistics & numerical data
Humans
Models, Genetic
Phenotype
Polymorphism, Single Nucleotide
Principal Component Analysis - methods
Quantitative Trait, Heritable
Venous Thrombosis - genetics
Title Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies
URI https://www.ncbi.nlm.nih.gov/pubmed/24746957
https://www.proquest.com/docview/1521326860
Volume 94
WOSCitedRecordID wos000335485700002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEA7qKnjx_X4QwWtw222T7klEFC-7eFDY25JMEq24bbW-f70zTWVPguCll9ISksnMN5Mv8zF2DKmzkYpA9CUANdWWwmCWIXzsE-W6se1nQWxCDYfZaNS_bgtudUur_PGJjaO2JVCN_ITiDEKNTHZPqydBqlF0utpKaMyyTg-hDFm1Gk1PEaKsUUbGTa2ERNzeXpoJ_C79cH9H1K6kaXIayd8hZhNqLpf_O8gVttSCTH4WrGKVzbhijS0E2cnPdVYN9Ec-yb8wanHEf7wipTReel6Fyrt-FMQ0LwsMSFy3XUvoPZCUxyOiU8uJGlZS_bbmecGp0-vEiffcOq6nC87rwFLcYLeXFzfnV6JVXhCA8OJFYNKovNNZrDFbg1gZa7QE0zUu9T51iKk0ZNKm1gBImRmvDaY2dC_KOGOkjTfZXIGj3Gbc2yTWseuZXqSTHiA8igD_nBj0FpAp2GFHP1M5Rsum4wpduPK1Hk8nc4dthfUYV6EFxzhOFOb1qdr9w9d7bJGWObAU91nH4752B2we3l7y-vmwMRl8Dq8H34qPzvQ
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Maximizing+the+power+of+principal-component+analysis+of+correlated+phenotypes+in+genome-wide+association+studies&rft.jtitle=American+journal+of+human+genetics&rft.au=Aschard%2C+Hugues&rft.au=Vilhj%C3%A1lmsson%2C+Bjarni+J&rft.au=Greliche%2C+Nicolas&rft.au=Morange%2C+Pierre-Emmanuel&rft.date=2014-05-01&rft.issn=1537-6605&rft.eissn=1537-6605&rft.volume=94&rft.issue=5&rft.spage=662&rft_id=info:doi/10.1016%2Fj.ajhg.2014.03.016&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1537-6605&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1537-6605&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1537-6605&client=summon