Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Central European journal of operations research Ročník 28; číslo 1; s. 279 - 308
Hlavní autor: Ulmer, Marlin W.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2020
Springer
Springer Nature B.V
Témata:
ISSN:1435-246X, 1613-9178
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA’s simulations, we also limit the RA’s simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1435-246X
1613-9178
DOI:10.1007/s10100-018-0588-x