Convergence analysis of the batch gradient-based neuro-fuzzy learning algorithm with smoothing L1/2 regularization for the first-order Takagi–Sugeno system
It has been proven that Takagi–Sugeno systems are universal approximators, and they are applied widely to classification and regression problems. The main challenges of these models are convergence analysis and their computational complexity due to the large number of connections and the pruning of...
Uloženo v:
| Vydáno v: | Fuzzy sets and systems Ročník 319; s. 28 - 49 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
15.07.2017
|
| Témata: | |
| ISSN: | 0165-0114, 1872-6801 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | It has been proven that Takagi–Sugeno systems are universal approximators, and they are applied widely to classification and regression problems. The main challenges of these models are convergence analysis and their computational complexity due to the large number of connections and the pruning of unnecessary parameters. The neuro-fuzzy learning algorithm involves two tasks: generating comparable sparse networks and training the parameters. In addition, regularization methods have attracted increasing attention for network pruning, particularly the Lq(0<q<1) regularizer after L1 regularization, which can obtain better solutions to sparsity problems. The L1/2 regularizer has a specific sparsity capacity and it is representative of Lq(0<q<1) regularizations. However, the nonsmoothness of the L1/2 regularizer may lead to oscillations in the learning process. In this study, we propose a gradient-based neuro-fuzzy learning algorithm with a smoothing L1/2 regularization for the first-order Takagi–Sugeno fuzzy inference system. The proposed approach has the following three advantages: (i) it enhances the original L1/2 regularizer by eliminating the oscillation of the gradient in the cost function during the training; (ii) it performs better by pruning inactive connections, where the number of the redundant connections for removal is higher than that generated by the original L1/2 regularizer, while it is also implemented by simultaneous structure and parameter learning processes; and (iii) it is possible to demonstrate the theoretical convergence analysis of this learning method, which we focus on explicitly. We also provide a series of simulations to demonstrate that the smoothing L1/2 regularization can often obtain more compressive representations than the current L1/2 regularization. |
|---|---|
| ISSN: | 0165-0114 1872-6801 |
| DOI: | 10.1016/j.fss.2016.07.003 |