Approximation Algorithms for Training One-Node ReLU Neural Networks

Training a one-node neural network with the ReLU activation function via optimization, which we refer to as the ON-ReLU problem, is a fundamental problem in machine learning. In this paper, we begin by proving the NP-hardness of the ON-ReLU problem. We then present an approximation algorithm to solv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on signal processing Jg. 68; S. 6696 - 6706
Hauptverfasser:	Dey, Santanu S., Wang, Guanyi, Xie, Yao
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Approximation Approximation algorithms Biological neural networks Machine learning Machine learning algorithms Neural networks Non-convex optimization Optimization Optimized production technology Performance enhancement Random noise Ratios Signal processing algorithms Training training neural networks
ISSN:	1053-587X, 1941-0476
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Training a one-node neural network with the ReLU activation function via optimization, which we refer to as the ON-ReLU problem, is a fundamental problem in machine learning. In this paper, we begin by proving the NP-hardness of the ON-ReLU problem. We then present an approximation algorithm to solve the ON-ReLU problem, whose running time is <inline-formula><tex-math notation="LaTeX">\mathcal {O}(n^k)</tex-math></inline-formula> where <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula> is the number of samples, and <inline-formula><tex-math notation="LaTeX">k</tex-math></inline-formula> is a predefined integral constant as an algorithm parameter. We analyze the performance of this algorithm under two regimes and show that: (1) given any arbitrary set of training samples, the algorithm guarantees an <inline-formula><tex-math notation="LaTeX">(n/k)</tex-math></inline-formula>-approximation for the ON-ReLU problem - to the best of our knowledge, this is the first time that an algorithm guarantees an approximation ratio for arbitrary data scenario; thus, in the ideal case (i.e., when the training error is zero) the approximation algorithm achieves the globally optimal solution for the ON-ReLU problem; and (2) given training sample with Gaussian noise, the same approximation algorithm achieves a much better asymptotic approximation ratio which is independent of the number of samples <inline-formula><tex-math notation="LaTeX">n</tex-math></inline-formula>. Extensive numerical studies show that our approximation algorithm can perform better than the gradient descent algorithm. Our numerical results also show that the solution of the approximation algorithm can provide a good initialization for gradient descent, which can significantly improve the performance.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2020.3039360