PROJECTED PRINCIPAL COMPONENT ANALYSIS IN FACTOR MODELS
This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise comp...
Saved in:
| Published in: | The Annals of statistics Vol. 44; no. 1; p. 219 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
01.02.2016
|
| Subjects: | |
| ISSN: | 0090-5364 |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semi-parametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index. |
|---|---|
| AbstractList | This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semi-parametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index. This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semi-parametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index.This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space. When the dimensionality is large, the factors can be estimated accurately even when the sample size is finite. We propose a flexible semi-parametric factor model, which decomposes the factor loading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The covariates' effects on the factor loadings are further modeled by the additive model via sieve approximations. By using the newly proposed Projected-PCA, the rates of convergence of the smooth factor loading matrices are obtained, which are much faster than those of the conventional factor analysis. The convergence is achieved even when the sample size is finite and is particularly appealing in the high-dimension-low-sample-size situation. This leads us to developing nonparametric tests on whether observed covariates have explaining powers on the loadings and whether they fully explain the loadings. The proposed method is illustrated by both simulated data and the returns of the components of the S&P 500 index. |
| Author | Wang, Weichen Fan, Jianqing Liao, Yuan |
| Author_xml | – sequence: 1 givenname: Jianqing surname: Fan fullname: Fan, Jianqing organization: Princeton University – sequence: 2 givenname: Yuan surname: Liao fullname: Liao, Yuan organization: University of Maryland – sequence: 3 givenname: Weichen surname: Wang fullname: Wang, Weichen organization: Princeton University |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/26783374$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1jztPhEAYRadY4z608QcYSht0Ht8AU05YVjEsQwALKzLAkKzhscJS-O8lca3uLU5O7t2iVT_0BqEHgp8JJfBCuC1VRpgDK7TBWGCbL32NttP0hTHmAtgtWlPH9RhzYYPcJFXvgZ8HeytJw9gPExlZvjomKg7i3JKxjD6zMLPC2DpIP1epdVT7IMru0E2j28ncX3OHPg5B7r_ZkXoNfRnZFYC42IxXBEoMHhG84YbRkmhOS9cDIYC6tSippwnTrljWG1M3UIHDjAOa1IJpj-7Q05_3PA7fs5kuRXeaKtO2ujfDPBXEo47DOSZsQR-v6Fx2pi7O46nT40_xf5b-AkAUTeU |
| CitedBy_id | crossref_primary_10_1016_j_jeconom_2017_08_009 crossref_primary_10_1093_jrsssa_qnad086 crossref_primary_10_1016_j_jeconom_2020_04_006 crossref_primary_10_1016_j_jkss_2018_04_005 crossref_primary_10_1016_j_jeconom_2022_04_005 crossref_primary_10_1016_j_jmva_2023_105155 crossref_primary_10_3390_axioms13070418 crossref_primary_10_1080_07350015_2025_2537387 crossref_primary_10_1080_03610926_2019_1576889 crossref_primary_10_1080_01621459_2024_2422129 crossref_primary_10_3390_systems13010026 crossref_primary_10_1093_rfs_hhaa020 crossref_primary_10_3982_QE2330 crossref_primary_10_1017_S0266466623000324 crossref_primary_10_1080_01621459_2025_2538272 crossref_primary_10_1080_03610918_2023_2196748 crossref_primary_10_1016_j_jeconom_2016_11_001 crossref_primary_10_3390_rs14092194 crossref_primary_10_1016_j_jfineco_2019_06_008 crossref_primary_10_1007_s42521_024_00107_2 crossref_primary_10_1016_j_jeconom_2020_09_009 crossref_primary_10_3390_sym13071278 crossref_primary_10_1007_s42952_025_00324_4 crossref_primary_10_1016_j_jmva_2024_105373 crossref_primary_10_1111_jofi_13477 crossref_primary_10_1214_24_AOS2412 crossref_primary_10_1016_j_jeconom_2024_105853 crossref_primary_10_1093_rfs_hhaa102 crossref_primary_10_1080_00273171_2019_1677208 crossref_primary_10_1007_s00180_022_01270_z crossref_primary_10_1016_j_pacfin_2024_102579 crossref_primary_10_1080_01621459_2025_2526697 crossref_primary_10_1086_735513 crossref_primary_10_1080_01621459_2022_2035099 crossref_primary_10_1016_j_jmva_2024_105403 crossref_primary_10_1146_annurev_financial_091420_011735 crossref_primary_10_1016_j_jeconom_2019_08_012 crossref_primary_10_1093_ectj_utac031 crossref_primary_10_1080_07350015_2024_2374971 crossref_primary_10_1287_mnsc_2023_4768 crossref_primary_10_1093_rfs_hhae036 crossref_primary_10_1093_rapstu_raad010 crossref_primary_10_2478_remav_2025_0018 crossref_primary_10_1016_j_apm_2025_116280 crossref_primary_10_1080_10618600_2022_2110883 crossref_primary_10_1093_jjfinec_nbad026 crossref_primary_10_1016_j_jmva_2021_104786 crossref_primary_10_1016_j_ress_2024_110440 crossref_primary_10_1007_s11408_025_00480_x crossref_primary_10_3390_app15105282 crossref_primary_10_3390_math13071121 crossref_primary_10_1111_caje_12336 crossref_primary_10_1088_1742_6596_1995_1_012065 crossref_primary_10_1080_07350015_2021_1961786 crossref_primary_10_1093_jjfinec_nbad024 crossref_primary_10_3390_math12213442 crossref_primary_10_1111_ectj_12117 crossref_primary_10_1080_07350015_2020_1730857 crossref_primary_10_1080_01621459_2021_1912757 crossref_primary_10_1016_j_csda_2018_03_015 crossref_primary_10_1016_j_jeconom_2018_09_003 crossref_primary_10_1016_j_jeconom_2020_07_013 crossref_primary_10_1080_07350015_2020_1721294 crossref_primary_10_1080_07350015_2020_1844212 crossref_primary_10_1111_ectj_12061 crossref_primary_10_1080_07350015_2025_2548893 crossref_primary_10_1016_j_jeconom_2020_07_009 crossref_primary_10_1093_jrsssb_qkae001 crossref_primary_10_1007_s00521_023_08313_6 crossref_primary_10_1111_biom_12698 crossref_primary_10_1093_jjfinec_nbaa045 crossref_primary_10_1016_j_jeconom_2023_105521 crossref_primary_10_1080_07350015_2024_2449391 crossref_primary_10_1007_s13171_025_00410_z crossref_primary_10_1111_jofi_12898 crossref_primary_10_1080_07350015_2021_2011736 crossref_primary_10_1016_j_jeconom_2022_11_001 crossref_primary_10_1016_j_jeconom_2025_106058 crossref_primary_10_1016_j_jeconom_2020_07_003 crossref_primary_10_1016_j_jeconom_2022_11_007 crossref_primary_10_1016_j_pacfin_2023_102106 crossref_primary_10_1093_jrsssb_qkae036 crossref_primary_10_1016_j_jeconom_2019_05_018 crossref_primary_10_1093_biostatistics_kxae027 crossref_primary_10_1080_02664763_2020_1753024 crossref_primary_10_1002_fut_22559 crossref_primary_10_1515_snde_2025_0042 crossref_primary_10_1016_j_jfineco_2019_05_001 crossref_primary_10_1214_17_AOS1588 crossref_primary_10_1214_21_AOS2152 crossref_primary_10_1080_07350015_2021_1927744 crossref_primary_10_1146_annurev_financial_101521_104735 crossref_primary_10_1017_S0266466625100091 crossref_primary_10_1016_j_jeconom_2017_06_023 crossref_primary_10_1214_16_AOS1487 crossref_primary_10_1080_01621459_2020_1831927 crossref_primary_10_1287_mnsc_2023_4966 crossref_primary_10_1111_jofi_13324 crossref_primary_10_1017_S0022109024000036 |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1214/15-AOS1364 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Statistics Mathematics |
| ExternalDocumentID | 26783374 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM072611 |
| GroupedDBID | -~X 123 23M 2AX 2FS 2WC 3R3 5RE 6J9 85S AAFWJ AAWIL AAYJJ ABAWQ ABBHK ABEFU ABFAN ABPFR ABPQH ABXSQ ABYWD ABZEH ACGFO ACHJO ACIPV ACIWK ACMTB ACNCT ACTMH ACUBG ADLSF ADNWM ADODI ADULT AECCQ AENEX AETVE AEUPB AFFOW AFVYC AFXHP AGLNM AI. AIHAF ALMA_UNASSIGNED_HOLDINGS ALRMG AS~ CJ0 CS3 D0L DQDLB DSRWC E3Z EBS ECEWR EJD F5P FEDTE FVMVE GR0 HDK HGD HQ6 HVGLF IPSME JAAYA JAS JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST L7B MVM N9A NHB NPM OFU OK1 P2P PQQKQ PUASD RBU REI RNS RPE SA0 SJN TN5 TR2 UPT UQL VH1 VOH WH7 WHG WS9 XSW YYP ZCG ZGI ZY4 7X8 AFHLI |
| ID | FETCH-LOGICAL-c449t-35c14b048195f5e32b1a52b78499427d9b28a13a79121eedf4c463e64a1d93a82 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 107 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000368022000008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0090-5364 |
| IngestDate | Thu Oct 02 18:28:25 EDT 2025 Mon Jul 21 05:43:36 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | loading matrix modeling sieve approximation high dimensionality rates of covergence semi-parametric factor models |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c449t-35c14b048195f5e32b1a52b78499427d9b28a13a79121eedf4c463e64a1d93a82 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 26783374 |
| PQID | 1826655013 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1826655013 pubmed_primary_26783374 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-02-01 |
| PublicationDateYYYYMMDD | 2016-02-01 |
| PublicationDate_xml | – month: 02 year: 2016 text: 2016-02-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | The Annals of statistics |
| PublicationTitleAlternate | Ann Stat |
| PublicationYear | 2016 |
| SSID | ssj0005943 |
| Score | 2.5777838 |
| Snippet | This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 219 |
| Title | PROJECTED PRINCIPAL COMPONENT ANALYSIS IN FACTOR MODELS |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/26783374 https://www.proquest.com/docview/1826655013 |
| Volume | 44 |
| WOSCitedRecordID | wos000368022000008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3JTsMwFLSAcigHlrKVTUbiarWO7dg-oaiLqJRNTUHlFDlOckwLBb4fO03pCQmJi2-2rNH4vbH9NA-AByr7UiuXIe0wjSguBVIk66Oi4JIbFmWkNtN58XkYivlcxs2D26opq9zExDpQ5wtt38h7Vge7Rk5j8rh8Q7ZrlP1dbVpo7IIWMVLGsprPt27hbFM1J_uIEZc29qQOpj3MkBclmFingd-kZZ1ixkf_3dwxOGzEJfTWbDgBO0XVAQfBjzPrqgPaVl2uzZlPAY-nkW13MhpCWxUxmMSeDwdREEfhKJxBL_T812SSwEkIx95gFk1hEA1HfnIGnsej2eAJNZ0UkKZUfiDCNKaZtYaRrGQFcTKsmJNxYe471OG5zByhMFFcGnBM1iyppi4pXKpwLokSzjnYqxZVcQmgVCUTwqgml-dU9VmWC5PhlasVy6jEtAvuNxClhqn2-0FVxeJzlW5B6oKLNc7pcm2pkZrVBCGcXv1h9jVoG9XSlE7fgFZpzmlxC_b1l4Hv_a6mgBnDOPgG2dWzow |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PROJECTED+PRINCIPAL+COMPONENT+ANALYSIS+IN+FACTOR+MODELS&rft.jtitle=The+Annals+of+statistics&rft.au=Fan%2C+Jianqing&rft.au=Liao%2C+Yuan&rft.au=Wang%2C+Weichen&rft.date=2016-02-01&rft.issn=0090-5364&rft.volume=44&rft.issue=1&rft.spage=219&rft_id=info:doi/10.1214%2F15-AOS1364&rft_id=info%3Apmid%2F26783374&rft_id=info%3Apmid%2F26783374&rft.externalDocID=26783374 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0090-5364&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0090-5364&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0090-5364&client=summon |