Delay-Tolerant OCO With Long-Term Constraints: Algorithm and Its Application to Network Resource Allocation

We consider online convex optimization (OCO) with multi-slot feedback delay. An agent selects a sequence of online decisions to minimize the accumulation of time-varying convex loss functions, subject to short-term and long-term constraints that may be time-varying. Both the convex loss function and...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM transactions on networking Ročník 31; číslo 1; s. 147 - 163
Hlavní autoři: Wang, Juncheng, Dong, Min, Liang, Ben, Boudreau, Gary, Abou-Zeid, Hatem
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.02.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1063-6692, 1558-2566
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We consider online convex optimization (OCO) with multi-slot feedback delay. An agent selects a sequence of online decisions to minimize the accumulation of time-varying convex loss functions, subject to short-term and long-term constraints that may be time-varying. Both the convex loss function and the long-term constraint function may experience multiple time slots of feedback delay to be received by the agent. Existing works on OCO under this general setting has focused on the static regret, which measures the gap of losses between an online decision sequence and a time-invariant static offline benchmark. In this work, besides the static regret, we also consider a more practically meaningful metric, the dynamic regret, where the benchmark is a time-varying online optimal decision sequence. We propose an efficient algorithm, termed Delay-Tolerant Constrained-OCO (DTC-OCO), which uses a novel double regularization together with a new penalty mechanism on the long-term constraint violation, to tackle the asynchrony between information feedback and decision updates. We obtain upper bounds for its static regret, dynamic regret, and constraint violation, proving that they are sublinear under mild conditions. Furthermore, we consider a variation of DTC-OCO with multi-step gradient descent, and show it provides improved dynamic regret and constraint violation bounds for strongly convex loss functions. For numerical demonstration, we apply DTC-OCO to a general network resource allocation problem. Our simulation results suggest substantial performance gain by DTC-OCO over the current best alternative.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-6692
1558-2566
DOI:10.1109/TNET.2022.3188285