Multi-stage stochastic gradient method with momentum acceleration
•Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and...
Uloženo v:
| Vydáno v: | Signal processing Ročník 188; s. 108201 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.11.2021
|
| Témata: | |
| ISSN: | 0165-1684, 1872-7557 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | •Stage-wise optimization and momentum have been widely employed to accelerate SGD.•Negative momentum provides acceleration and stabilization on stochastic first-order methods.•Negative momentum extends Nesterovs momentum to the stage-wise optimization.•Gradient correction avoids the oscillations and make stochastic gradient more effective and tolerant.
Multi-stage optimization which invokes a stochastic algorithm restarting with the returned solution of previous stage, has been widely employed in stochastic optimization. Momentum acceleration technique is famously known for building gradient-based algorithms with fast convergence in large-scale optimization. In order to take the advantage of this acceleration in multi-stage stochastic optimization, we develop a multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastic convex optimization. The main ingredient is the employment of a negative momentum, which extends the Nesterov’s momentum to the multi-stage optimization. It can be incorporated in a stochastic gradient-based algorithm in multi-stage mechanism and provide acceleration. The proposed algorithm obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. The experimental results demonstrate that our algorithm is competitive with some state-of-the-art methods for solving several typical optimization problems in machine learning. |
|---|---|
| ISSN: | 0165-1684 1872-7557 |
| DOI: | 10.1016/j.sigpro.2021.108201 |