Horizontal combinations of online and offline approximate dynamic programming for stochastic dynamic vehicle routing

Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation...

Full description

Saved in:
Bibliographic Details
Published in:Central European journal of operations research Vol. 28; no. 1; pp. 279 - 308
Main Author: Ulmer, Marlin W.
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2020
Springer
Springer Nature B.V
Subjects:
ISSN:1435-246X, 1613-9178
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA’s simulations, we also limit the RA’s simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1435-246X
1613-9178
DOI:10.1007/s10100-018-0588-x