Privacy preserving planning in multi-agent stochastic environments
Collaborative privacy preserving planning ( cppp ) gained much attention in the past decade. cppp aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. T...
Saved in:
| Published in: | Autonomous agents and multi-agent systems Vol. 36; no. 1 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.04.2022
Springer Nature B.V |
| Subjects: | |
| ISSN: | 1387-2532, 1573-7454 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Collaborative privacy preserving planning (
cppp
) gained much attention in the past decade.
cppp
aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. To date,
cppp
has focused on domains with deterministic action effects. However, in real-world problems action effects are often non-deterministic, and actions can have multiple possible effects with varying probabilities. In this paper, we introduce Stochastic
cppp
(
scppp
), which is an extension of
cppp
to domains with stochastic action effects. We show how
scppp
can be modeled as a Markov decision process (
mdp
) and how the
value-iteration
algorithm can be adapted to solve it. This adaptation requires extending
value-iteration
to support multiple agents and privacy. Then, we present two adaptions of the real-time dynamic programming (
rtdp
) algorithm, a popular algorithm for solving
mdp
s, designed to solve
scppp
problems. The first
rtdp
adaptation, called distributed
rtdp
(
drtdp
), yields identical behavior to applying
rtdp
in a centralized manner on the joint problem. To preserve privacy,
drtdp
uses a message passing mechanism adopted from the
mafs
algorithm. The second
rtdp
adaptation is an approximation of
drtdp
called public synchronization
rtdp
(
ps
-
rtdp
).
ps
-
rtdp
differs from
drtdp
mainly in its message passing mechanism, where
ps
-
rtdp
sends significantly fewer messages than
drtdp
. We experimented on domains adapted from the deterministic
cppp
literature by adding different stochastic effects to different actions. The results show that
ps
-
rtdp
can reduce the amount of messages compared to
drtdp
by orders of magnitude thus improving run-time, while producing policies with similar expected costs. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1387-2532 1573-7454 |
| DOI: | 10.1007/s10458-022-09554-w |