Large-scale convex optimization: parallelization and variance reduction
Saved in:
| Title: | Large-scale convex optimization: parallelization and variance reduction |
|---|---|
| Authors: | TRAORE', MOHAMED CHEIK IBRAHIM |
| Contributors: | Traore', MOHAMED CHEIK IBRAHIM, VILLA, SILVIA, VIGNI, STEFANO |
| Publisher Information: | Università degli studi di Genova |
| Publication Year: | 2024 |
| Collection: | Università degli Studi di Genova: CINECA IRIS |
| Subject Terms: | Convex optimization, asynchronous algorithm, randomized block-coordinate descent, error bound, stochastic quasi-Fejér sequence, forward-backward algorithm, convergence rate, stochastic optimization, proximal point algorithm, variance reduction technique, SVRG, SAGA, Settore MAT/08 - Analisi Numerica, Settore MAT/09 - Ricerca Operativa |
| Description: | In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ... |
| Document Type: | doctoral or postdoctoral thesis |
| Language: | English |
| Relation: | https://hdl.handle.net/11567/1177695 |
| DOI: | 10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10 |
| Availability: | https://hdl.handle.net/11567/1177695 https://doi.org/10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10 |
| Rights: | info:eu-repo/semantics/openAccess |
| Accession Number: | edsbas.B5E67CF9 |
| Database: | BASE |
| Abstract: | In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ... |
|---|---|
| DOI: | 10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10 |
Nájsť tento článok vo Web of Science