Fast DNN training based on auxiliary function technique

Deep neural networks (DNN) are typically optimized with stochastic gradient descent (SGD) using a fixed learning rate or an adaptive learning rate approach (ADAGRAD). In this paper, we introduce a new learning rule for neural networks that is based on an auxiliary function technique without paramete...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) s. 2160 - 2164
Hlavní autori:	Tran, Dung T., Ono, Nobutaka, Vincent, Emmanuel
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.04.2015
Predmet:	adaptive learning rate Approximation algorithms Approximation methods Artificial neural networks auxiliary function technique back-propagation DNN gradient descent Optimization Robustness Switches Training
ISSN:	1520-6149
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Deep neural networks (DNN) are typically optimized with stochastic gradient descent (SGD) using a fixed learning rate or an adaptive learning rate approach (ADAGRAD). In this paper, we introduce a new learning rule for neural networks that is based on an auxiliary function technique without parameter tuning. Instead of minimizing the objective function, a quadratic auxiliary function is recursively introduced layer by layer which has a closed-form optimum. We prove the monotonic decrease of the new learning rule. Our experiments show that the proposed algorithm converges faster and to a better local minimum than SGD. In addition, we propose a combination of the proposed learning rule and ADAGRAD which further accelerates convergence. Experimental evaluation on the MNIST database shows the benefit of the proposed approach in terms of digit recognition accuracy.
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2015.7178353