Large-scale convex optimization: parallelization and variance reduction

Uloženo v:
Podrobná bibliografie
Název: Large-scale convex optimization: parallelization and variance reduction
Autoři: TRAORE', MOHAMED CHEIK IBRAHIM
Přispěvatelé: Traore', MOHAMED CHEIK IBRAHIM, VILLA, SILVIA, VIGNI, STEFANO
Informace o vydavateli: Università degli studi di Genova
Rok vydání: 2024
Sbírka: Università degli Studi di Genova: CINECA IRIS
Témata: Convex optimization, asynchronous algorithm, randomized block-coordinate descent, error bound, stochastic quasi-Fejér sequence, forward-backward algorithm, convergence rate, stochastic optimization, proximal point algorithm, variance reduction technique, SVRG, SAGA, Settore MAT/08 - Analisi Numerica, Settore MAT/09 - Ricerca Operativa
Popis: In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ...
Druh dokumentu: doctoral or postdoctoral thesis
Jazyk: English
Relation: https://hdl.handle.net/11567/1177695
DOI: 10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10
Dostupnost: https://hdl.handle.net/11567/1177695
https://doi.org/10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10
Rights: info:eu-repo/semantics/openAccess
Přístupové číslo: edsbas.B5E67CF9
Databáze: BASE
Popis
Abstrakt:In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ...
DOI:10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10