Batch Gradient Learning Algorithm with Smoothing L1 Regularization for Feedforward Neural Networks

Regularization techniques are critical in the development of machine learning models. Complex models, such as neural networks, are particularly prone to overfitting and to performing poorly on the training data. L1 regularization is the most extreme way to enforce sparsity, but, regrettably, it does...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers (Basel) Ročník 12; číslo 1; s. 4
Hlavní autor: Mohamed, Khidir Shaib
Médium: Journal Article
Jazyk:angličtina
Vydáno: Basel MDPI AG 01.01.2023
Témata:
ISSN:2073-431X, 2073-431X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Regularization techniques are critical in the development of machine learning models. Complex models, such as neural networks, are particularly prone to overfitting and to performing poorly on the training data. L1 regularization is the most extreme way to enforce sparsity, but, regrettably, it does not result in an NP-hard problem due to the non-differentiability of the 1-norm. However, the L1 regularization term achieved convergence speed and efficiency optimization solution through a proximal method. In this paper, we propose a batch gradient learning algorithm with smoothing L1 regularization (BGSL1) for learning and pruning a feedforward neural network with hidden nodes. To achieve our study purpose, we propose a smoothing (differentiable) function in order to address the non-differentiability of L1 regularization at the origin, make the convergence speed faster, improve the network structure ability, and build stronger mapping. Under this condition, the strong and weak convergence theorems are provided. We used N-dimensional parity problems and function approximation problems in our experiments. Preliminary findings indicate that the BGSL1 has convergence faster and good generalization abilities when compared with BGL1/2, BGL1, BGL2, and BGSL1/2. As a result, we demonstrate that the error function decreases monotonically and that the norm of the gradient of the error function approaches zero, thereby validating the theoretical finding and the supremacy of the suggested technique.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2073-431X
2073-431X
DOI:10.3390/computers12010004