View in EDS

Large-scale convex optimization: parallelization and variance reduction

Saved in:

Bibliographic Details
Title:	Large-scale convex optimization: parallelization and variance reduction
Authors:	TRAORE', MOHAMED CHEIK IBRAHIM
Contributors:	Traore', MOHAMED CHEIK IBRAHIM, VILLA, SILVIA, VIGNI, STEFANO
Publisher Information:	Università degli studi di Genova
Publication Year:	2024
Collection:	Università degli Studi di Genova: CINECA IRIS
Subject Terms:	Convex optimization, asynchronous algorithm, randomized block-coordinate descent, error bound, stochastic quasi-Fejér sequence, forward-backward algorithm, convergence rate, stochastic optimization, proximal point algorithm, variance reduction technique, SVRG, SAGA, Settore MAT/08 - Analisi Numerica, Settore MAT/09 - Ricerca Operativa
Description:	In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ...
Document Type:	doctoral or postdoctoral thesis
Language:	English
Relation:	https://hdl.handle.net/11567/1177695
DOI:	10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10
Availability:	https://hdl.handle.net/11567/1177695 https://doi.org/10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10
Rights:	info:eu-repo/semantics/openAccess
Accession Number:	edsbas.B5E67CF9
Database:	BASE

View record from BASE

Nájsť tento článok vo Web of Science

Description
Abstract:	In this work, we investigate two aspects of large-scale optimization for convex functions defined on an infinite-dimensional separable Hilbert space: parallelized methods and incremental methods. These methods are used to efficiently solve problems that arise in data science, especially in machine learning and inverse problems. In parallelized optimization methods, the computational load of running the algorithm is distributed among several workers. For example, if the algorithm comprises a gradient computation, one can give each worker a coordinate of the gradient to compute, and then put everything back together. A parallelized algorithm is called synchronous if there is a synchronization phase where local information of all workers are updated. It is called asynchronous if there is no such phase. In practice, asynchronous implementations are preferred to synchronous ones. However, their analysis has to account for delayed information, which is modeled by a delay vector. In this document, we study an asynchronous version of random block coordinate descent, where only one randomly selected coordinate is used at each iteration. We consider a version in which the selection probability of the coordinates is arbitrary, in contrast to what is done in the literature for asynchronous algorithms. We also allow coordinate-wise stepsize rule. Under convexity assumption, we prove weak convergence of the iterates and sublinear convergence rate. Assuming an additional error bound condition, we prove a linear convergence rate and strong convergence of the iterates. In both cases, the dependence on the delay vector is linear. Incremental optimization methods are iterative algorithms used to minimize a function defined as a finite sum of functions. The function is then minimized by using one summand at each iteration instead of the whole function. We are interested in the case where the choice of the summand is random. This leads to stochastic algorithms such as stochastic gradient descent (SGD) or stochastic proximal point ...
DOI:	10.15167/traore-mohamed-cheik-ibrahim_phd2024-06-10