Multi-stage stochastic gradient method with momentum acceleration

•Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Signal processing Ročník 188; s. 108201
Hlavní autoři: Luo, Zhijian, Chen, Siyu, Qian, Yuntao, Hou, Yueen
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.11.2021
Témata:
ISSN:0165-1684, 1872-7557
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant. Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning.
ISSN:0165-1684
1872-7557
DOI:10.1016/j.sigpro.2021.108201