Constrained Online Convex Optimization With Feedback Delays

In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori . The loss/constraint functions vary with time and their feedback informat...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on automatic control Vol. 66; no. 11; pp. 5049 - 5064
Main Authors: Cao, Xuanyu, Zhang, Junshan, Poor, H. Vincent
Format: Journal Article
Language:English
Published: New York IEEE 01.11.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0018-9286, 1558-2523
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori . The loss/constraint functions vary with time and their feedback information is revealed to the decision maker with delays, which arise in many applications. We first consider the scenario of delayed function feedback, in which the complete information of the loss/constraint functions is revealed to the decision maker with delays. We develop a modified online saddle point algorithm suitable for constrained OCO with feedback delays. Sublinear regret and sublinear constraint violation bounds are established for the algorithm in terms of the delays. In practice, the complete information (functional forms) of the loss/constraint functions may not be revealed to the decision maker. Thus, we further examine the scenario of delayed bandit feedback, where only the values of the loss/constraint functions at two random points close to the chosen action are revealed to the decision maker with delays. A delayed version of the bandit online saddle point algorithm is proposed by utilizing stochastic gradient estimates of the loss/constraint functions based on delayed bandit feedback. We also establish sublinear regret and sublinear constraint violation bounds for this bandit optimization algorithm in terms of the delays. Finally, numerical results for online quadratically constrained quadratic programs are presented to corroborate the efficacy of the proposed algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9286
1558-2523
DOI:10.1109/TAC.2020.3030743