Distributed Nesterov-like gradient algorithms

In classical, centralized optimization, the Nesterov gradient algorithm reduces the number of iterations to produce an ε-accurate solution (in terms of the cost function) with respect to ordinary gradient from O(1/ε) to equation. This improvement is achieved on a class of convex functions with Lipsc...

Full description

Saved in:
Bibliographic Details
Published in:2012 IEEE 51st IEEE Conference on Decision and Control (CDC) pp. 5459 - 5464
Main Authors: Jakovetic, Dusan, Moura, J. M. F., Xavier, J.
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2012
Subjects:
ISBN:9781467320658, 146732065X
ISSN:0191-2216
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In classical, centralized optimization, the Nesterov gradient algorithm reduces the number of iterations to produce an ε-accurate solution (in terms of the cost function) with respect to ordinary gradient from O(1/ε) to equation. This improvement is achieved on a class of convex functions with Lipschitz continuous first derivative, and it comes at a very small additional computational cost per iteration. In this paper, we consider distributed optimization, where nodes in the network cooperatively minimize the sum of their private costs subject to a global constraint. To solve this problem, recent literature proposes distributed (sub)gradient algorithms, that are attractive due to computationally inexpensive iterations, but that converge slowly-the ε error is achieved in O(1/ε 2 ) iterations. Here, building from the Nesterov gradient algorithm, we present a distributed, constant step size, Nesterov-like gradient algorithm that converges much faster than existing distributed (sub)gradient methods, with zero additional communications and very small additional computations per iteration k. We show that our algorithm converges to a solution neighborhood, such that, for a convex compact constraint set and optimized stepsize, the convergence time is O(1/ε). We achieve this on a class of convex, coercive, continuously differentiable private costs with Lipschitz first derivative. We derive our algorithm through a useful penalty, network's Laplacian matrix-based reformulation of the original problem (referred to as the clone problem) - the proposed method is precisely the Nesterov-gradient applied on the clone problem. Finally, we illustrate the performance of our algorithm on distributed learning of a classifier via logistic loss.
ISBN:9781467320658
146732065X
ISSN:0191-2216
DOI:10.1109/CDC.2012.6425938