Optimal stochastic gradient descent algorithm for filtering

Stochastic Gradient Descent (SGD) is a fundamental optimization technique in machine learning, due to its efficiency in handling large-scale data. Unlike typical SGD applications, which rely on stochastic approximations, this work explores the convergence properties of SGD from a deterministic persp...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Digital signal processing Ročník 155; s. 104731
Hlavní autoři:	Turali, M. Yigit, Koc, Ali T., Kozat, Suleyman S.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.12.2024
Témata:	Learning rate Linear filtering Optimization Stochastic gradient descent Linear filtering Stochastic gradient descent Learning rate Optimization
ISSN:	1051-2004
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Stochastic Gradient Descent (SGD) is a fundamental optimization technique in machine learning, due to its efficiency in handling large-scale data. Unlike typical SGD applications, which rely on stochastic approximations, this work explores the convergence properties of SGD from a deterministic perspective. We address the crucial aspect of learning rate settings, a common obstacle in optimizing SGD performance, particularly in complex environments. In contrast to traditional methods that often provide convergence results based on statistical expectations (which are usually not justified), our approach introduces universally applicable learning rates. These rates ensure that a model trained with SGD matches the performance of the best linear filter asymptotically, applicable irrespective of the data sequence length and independent of statistical assumptions about the data. By establishing learning rates that scale as μ=O(1t), we offer a solution that sidesteps the need for prior data knowledge, a prevalent limitation in real-world applications. To this end, we provide a robust framework for SGD's application across varied settings, guaranteeing convergence results that hold under both deterministic and stochastic scenarios without any underlying assumptions.
ISSN:	1051-2004
DOI:	10.1016/j.dsp.2024.104731