Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

Dra. Claudia Archetti
ESSEC Business School in Paris, France

E-commerce markets are booming at remarkable rates. Due to an unprecedented series of lockdowns, billions of people stay at home to prevent the spread of the virus. E-commerce revenue saw a 10% increase in Europe in 2020 due to the pandemic. When it comes to the specific challenges to improve customers’ satisfactions on delivery, one of the crucial features is related to the starting point of the last-mile delivery leg. An important feature of last-mile distribution is that its operations start as soon as parcels are delivered to the final logistic center, typically a city distribution center (CDC). Given the short delivery times requested by the customers, delivery operations typically need to start before all parcels expected to arrive that day are available at the CDC. This raises a question: whether to wait for more or all parcels to be delivered at the CDC or to start the delivery as soon as there is any available parcel and vehicle. In this paper, we focus on the sequential decision making problem related to when to deliver parcels and which parcels to deliver under the assumption that the time at which parcels become available at the CDC (called release dates) is stochastic and dynamic. We introduce a new problem called the Orienteering Problem with Stochastic and Dynamic Release Dates. To address the problem, we employ Monte-Carlo simulation and reinforcement learning. To deal with data uncertainty and to reduce the dimension of the search space, we introduce a batch approach that approximates the value of future states while making use of continuous. We then propose two novel hybrid approaches to derive a decision-making policy. Both of them require multiple scenarios sampled using a direct look-ahead approach, and make use of the batch procedure to approximate the value of future states. They also make a good use of the Integer Linear Programming (ILP) techniques to find the best decision in the (approximated) decision space. In the computational study, we compare the two solution approaches and a myopic benchmark approach.


SHORT BIO

Since September 2021, Claudia Archetti is Full Professor in Operations Research at ESSEC Business School in Paris. She was previously Associate Professor at the University of Brescia. She teaches courses for undergraduate, master and PhD students in OR and logistics. The main areas of the scientific activity are: models and algorithms for vehicle routing problems; mixed integer mathematical programming models for the minimization of the sum of inventory and transportation costs in logistic networks; exact and heuristic algorithms for supply-chain management; reoptimization of combinatorial optimization problems. Claudia Archetti has carried out the scientific activity in collaboration with Italian and foreign colleagues and published joint papers with some of the best researchers at the international level. She is author of more than 90 papers in international journals. She was Area Editor of Computers and Operations Research. She is Associate Editor of Omega, Transportation Science, Networks, TOP and EURO Journal of Computational Optimization and member of the Editorial Board of European Journal of Operational Research. Claudia Archetti is VIP3 of EURO, the Association of European Operational Research Societies, in charge of publications and communication