A randomized exponential canonical correlation analysis method for data analysis and dimensionality reduction

Canonical correlation analysis (CCA) is a famous data analysis method that has been successfully used in many areas. CCA extracts meaningful information from a pair of data sets, by seeking pairs of linear combinations from two sets of variables with maximum correlation. Mathematically, CCA resorts...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied numerical mathematics Jg. 164; S. 101 - 124
Hauptverfasser: Wu, Gang, Li, Fei
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.06.2021
Schlagworte:
ISSN:0168-9274, 1873-5460
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Canonical correlation analysis (CCA) is a famous data analysis method that has been successfully used in many areas. CCA extracts meaningful information from a pair of data sets, by seeking pairs of linear combinations from two sets of variables with maximum correlation. Mathematically, CCA resorts to solving a large-scale generalized eigenvalue problem. However, as the dimension of the data sets is much larger than the number of samples, CCA may suffer from the small-sample-size (SSS) problem and the over-fitting problem. In order to overcome these difficulties, the regularized technique is often applied, but it is difficult to choose the optimal parameter in advance. In this work, we propose an Exponential Canonical Correlation Analysis (ECCA) method based on matrix exponential, which is parameter-free and can overcome the over-fitting and the SSS problems fundamentally. However, the computational overhead of the ECCA method is very high in practice. Based on the randomized singular value decomposition (RSVD), we then propose a Randomized Exponential Canonical Correlation Analysis (RECCA) method for data analysis and dimensionality reduction. Theoretical results are given to show the rationality of this randomized method, and establish the relationship between RECCA and ECCA. Numerical experiments are performed on some real-world, high-dimensional and large-sample data sets, which illustrate the superiority of the proposed algorithms over many state-of-the-art CCA algorithms.
ISSN:0168-9274
1873-5460
DOI:10.1016/j.apnum.2020.09.013