Exact and Approximation Algorithms for Sparse Principal Component Analysis

Uložené v:
Podrobná bibliografia
Názov: Exact and Approximation Algorithms for Sparse Principal Component Analysis
Autori: Yongchun Li, Weijun Xie
Zdroj: INFORMS Journal on Computing. 37:582-602
Informácie o vydavateľovi: Institute for Operations Research and the Management Sciences (INFORMS), 2025.
Rok vydania: 2025
Predmety: 0211 other engineering and technologies, 02 engineering and technology, 0101 mathematics, 01 natural sciences
Popis: Sparse principal component analysis (SPCA) is designed to enhance the interpretability of traditional principal component analysis by optimally selecting a subset of features that comprise the first principal component. Given the NP-hard nature of SPCA, most current approaches resort to approximate solutions, typically achieved through tractable semidefinite programs or heuristic methods. To solve SPCA to optimality, we propose two exact mixed-integer semidefinite programs (MISDPs) and an arbitrarily equivalent mixed-integer linear program. The MISDPs allow us to design an effective branch-and-cut algorithm with closed-form cuts that do not need to solve dual problems. For the proposed mixed-integer formulations, we further derive the theoretical optimality gaps of their continuous relaxations. Besides, we apply the greedy and local search algorithms to solving SPCA and derive their first-known approximation ratios. Our numerical experiments reveal that the exact methods we developed can efficiently find optimal solutions for data sets containing hundreds of features. Furthermore, our approximation algorithms demonstrate both scalability and near-optimal performance when benchmarked on larger data sets, specifically those with thousands of features. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms—Discrete. Funding: This research was supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [Grant 224614], the Division of Computing and Communication Foundations [Grant 2246417], and the Office of Naval Research [Grant N00014-24-1-2066]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0372 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0372 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Druh dokumentu: Article
Jazyk: English
ISSN: 1526-5528
1091-9856
DOI: 10.1287/ijoc.2022.0372
Prístupové číslo: edsair.doi...........1ee2f532070adb42d9c287cc121c15d4
Databáza: OpenAIRE
Popis
Abstrakt:Sparse principal component analysis (SPCA) is designed to enhance the interpretability of traditional principal component analysis by optimally selecting a subset of features that comprise the first principal component. Given the NP-hard nature of SPCA, most current approaches resort to approximate solutions, typically achieved through tractable semidefinite programs or heuristic methods. To solve SPCA to optimality, we propose two exact mixed-integer semidefinite programs (MISDPs) and an arbitrarily equivalent mixed-integer linear program. The MISDPs allow us to design an effective branch-and-cut algorithm with closed-form cuts that do not need to solve dual problems. For the proposed mixed-integer formulations, we further derive the theoretical optimality gaps of their continuous relaxations. Besides, we apply the greedy and local search algorithms to solving SPCA and derive their first-known approximation ratios. Our numerical experiments reveal that the exact methods we developed can efficiently find optimal solutions for data sets containing hundreds of features. Furthermore, our approximation algorithms demonstrate both scalability and near-optimal performance when benchmarked on larger data sets, specifically those with thousands of features. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms—Discrete. Funding: This research was supported in part by the Division of Civil, Mechanical and Manufacturing Innovation [Grant 224614], the Division of Computing and Communication Foundations [Grant 2246417], and the Office of Naval Research [Grant N00014-24-1-2066]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0372 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0372 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
ISSN:15265528
10919856
DOI:10.1287/ijoc.2022.0372