Podrobná bibliografie
| Název: |
Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference. |
| Autoři: |
Shen, Shuting1 (AUTHOR), Lu, Junwei2 (AUTHOR), Lin, Xihong2,3 (AUTHOR) xlin@hsph.harvard.edu |
| Zdroj: |
Journal of the American Statistical Association. Mar2026, Vol. 121 Issue 553, p585-597. 13p. |
| Témata: |
*DISTRIBUTED computing, *BIG data, PRINCIPAL components analysis, INFERENTIAL statistics, FACTOR analysis, DISTRIBUTED databases, POPULATION differentiation |
| Abstrakt: |
In light of the rapidly growing large-scale data in federated ecosystems, the traditional principal component analysis (PCA) is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under distributed settings. In this article, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension d and the sample size n are ultra-large, by simultaneously performing parallel computing along d and distributed computing along n. Specifically, we use L parallel copies of p-dimensional fast sketches to divide the computing burden along d and aggregate the results distributively along the split samples. We present a general framework applicable to multiple statistical problems, and establish comprehensive theoretical results under the general framework. We show that FADI accelerates the computation while enjoying the same non-asymptotic error rate as the traditional PCA when Lp ≥ d . We also derive inferential results that characterize the asymptotic distribution of FADI, and show a phase-transition phenomenon as Lp increases. We perform extensive simulations to empirically validate our theoretical findings, and apply FADI to the 1000 Genomes data to study the population structure. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work. [ABSTRACT FROM AUTHOR] |
|
Copyright of Journal of the American Statistical Association is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Databáze: |
Business Source Index |