Constrained Online Convex Optimization With Feedback Delays

In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori . The loss/constraint functions vary with time and their feedback informat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control Jg. 66; H. 11; S. 5049 - 5064
Hauptverfasser:	Cao, Xuanyu, Zhang, Junshan, Poor, H. Vincent
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.11.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Algorithms Bandit feedback Benchmark testing Computational geometry constrained optimization Constraints Convex analysis Convexity Decision making Decision theory Delays Feedback feedback delay function feedback Functionals online convex optimization (OCO) Optimization Saddle points Time factors
ISSN:	0018-9286, 1558-2523
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori . The loss/constraint functions vary with time and their feedback information is revealed to the decision maker with delays, which arise in many applications. We first consider the scenario of delayed function feedback, in which the complete information of the loss/constraint functions is revealed to the decision maker with delays. We develop a modified online saddle point algorithm suitable for constrained OCO with feedback delays. Sublinear regret and sublinear constraint violation bounds are established for the algorithm in terms of the delays. In practice, the complete information (functional forms) of the loss/constraint functions may not be revealed to the decision maker. Thus, we further examine the scenario of delayed bandit feedback, where only the values of the loss/constraint functions at two random points close to the chosen action are revealed to the decision maker with delays. A delayed version of the bandit online saddle point algorithm is proposed by utilizing stochastic gradient estimates of the loss/constraint functions based on delayed bandit feedback. We also establish sublinear regret and sublinear constraint violation bounds for this bandit optimization algorithm in terms of the delays. Finally, numerical results for online quadratically constrained quadratic programs are presented to corroborate the efficacy of the proposed algorithms.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2020.3030743