Multiagent value iteration algorithms in dynamic programming and reinforcement learning

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a give...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Results in control and optimization Ročník 1; s. 100003
Hlavní autor:	Bertsekas, Dimitri
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.12.2020 Elsevier
ISSN:	2666-7207, 2666-7207
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.
ISSN:	2666-7207 2666-7207
DOI:	10.1016/j.rico.2020.100003