Distributed algorithm for best subset regression

High-dimensional massive data modeling faces critical challenges in computational efficiency, memory constraints, and privacy protection. We develop a distributed framework for best subset regression with convex twice-differentiable losses (e.g., linear, multiplicative, and logistic regression). The...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Expert systems with applications Ročník 277; s. 127224
Hlavní autori:	Ming, Hao, Yang, Hu
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Ltd 05.06.2025
Predmet:	Distributed learning KKT conditions L0 penalty Oracle property L0 penalty Distributed learning KKT conditions Oracle property
ISSN:	0957-4174
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	High-dimensional massive data modeling faces critical challenges in computational efficiency, memory constraints, and privacy protection. We develop a distributed framework for best subset regression with convex twice-differentiable losses (e.g., linear, multiplicative, and logistic regression). The proposed distributed enhanced primal–dual active set (DEPDAS) algorithm employs enhanced distributed computing to efficiently approximate optimal solutions in low-dimensional parameter spaces. Under standard regularity conditions, DEPDAS preserves the statistical properties of the full-sample-based EPDAS algorithm, including optimal estimation error rates and Oracle properties. With a per-iteration communication cost of O(2T+2p) for DEPDAS, our master-machine initialization strategy accelerates convergence while reducing communication overhead. Furthermore, we derive a lower communication DEPDAS (LCDEPDAS) variant with O(4T) per-iteration cost. Extensive simulations and empirical studies demonstrate the superiority of both algorithms over state-of-the-art methods in estimation accuracy and prediction performance.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2025.127224