IJPR MIM 2019 Revised 030720
IJPR MIM 2019 Revised 030720
a
LAMIH, UMR CNRS 8201, Université Polytechnique Hauts-de-France, Valenciennes, France
b
LS2N UMR CNRS 6004, Université de Nantes, Nantes, France
Abstract
Supply chains are complex, stochastic systems. Nowadays, logistics mangers face
two main problems: increasingly diverse and variable customer demand that is
difficult to predict. Classical forecasting methods implemented in many business
units have limitations with regard to fluctuating demand and the complexity of
fully connected supply chains. Machine Learning methods have been proposed to
improve prediction. In this paper, a Long Short-Term Memory (LSTM) recurrent
neural network is proposed for demand forecasting in a Physical Internet supply
chain network. A hybrid genetic algorithm and scatter search metaheuristics are
also proposed to automate the tuning of the LSTM hyperparameters. To assess
the performance of the proposed method, a real case study on agricultural
products in a supply chain in Thailand was considered. Accuracy and the
coefficient of determination were the key performance indicators used to
compare the performance of the proposed method with other supervised learning
methods: ARIMAX, Support Vector Regression, and Multiple Linear Regression.
The results prove the better forecasting efficiency of the LSTM method with
continuously fluctuating demand, whereas the other methods offer greater
performance with less varied demand. The empirical results of the hybrid
metaheuristics show that the performance in terms of accuracy and computational
time is higher than with the trial-and-error method.
1
1. Introduction
2
In this study, the demand forecasting problem was tackled within the context of
the physical internet. The physical internet network was inspired by a real case study
relating to the distribution of agricultural products in Thailand. Given the increasing
variety and changing data flows, researchers have adopted innovative, hybrid methods.
As classical forecasting techniques have shown their limits, a new approach is proposed
based on learning techniques.
The main contributions of the paper are:
- A forecasting approach based on Long Short-Term Memory (LSTM) in the
context of the Physical Internet. This approach was compared with classical
ones.
- A genetic algorithm and scatter search hybrid metaheuristic to automate the
tuning of the LSTM hyperparameters to improve its efficiency
- The proposed model was tested on a real case study of agricultural products
in a supply chain in Thailand.
This paper is divided into six sections. This section introduces the paper. Section
2 reviews the literature on forecasting models. Section 3 details the problem statements
and assumptions. Section 4 details the methodology, the implementation of the
proposed forecasting approach, and the parameter tuning technique. Section 5 presents
the comparison between the proposed approach and classical forecasting models, as
well as the results of the parameter tuning process. The conclusion and some future
lines of research are given in Section 6.
2. Literature review
The literature review is structured as follows. Firstly, some forecasting models
are presented with their advantages and limitations. Secondly, the main metaheuristics
used to improve the forecasting models are reviewed, especially those used to tune the
model hyperparameters. For each forecasting method, a short description and relevant
applications are presented.
2.1 Forecasting models
Demand forecasting is an important issue and a fundamental step in supply chain
management. It consists in estimating the consumption of products or services for the
upcoming periods making it possible to plan activities and thus reduce delivery times
and adjust stock levels, for example, and optimize operating costs. Forecasting is not
3
easy, especially for complex, open systems such as PI. Indeed, there is no totally safe
and reliable method, and forecasting can affect many decisions. Forecasting methods
are primarily based on historical data (quantitative methods), assessments or estimates
(qualitative methods), or a mixture of both. Quantitative methods can be based on the
historical sequence of observed demand (times-series models), some exogenous
parameters that can affect the performance of the model (causal model), or both. Many
forecasting models are implemented and tested with time-series data. Classical methods
such as Moving Average, Naïve Approach, or Exponential Smoothing are easily
proposed to forecast trends in time-series data (Box and Jenkins 1970). However, to
forecast non-linear trends, some machine learning methods would be better and could
perform better compared to classical methods (Carbonneau, Laframboise, and Vahidov
2008). Time-series models are typically developed using historical values. They are
easy to model, can provide predictions over a specific period, and use the difference
between the predicted and real values in the immediate past to tune the model
parameters. However, some of them do not capture the effect of other factors that could
affect demand such as demand at other nodes in the PI network, stock levels in PI hubs,
unit price of each product. Neural Networks (NN) are designed to learn the relationship
between these factors and demand in a non-statistical approach. NN-based
methodologies do not require any predefined mathematical models but model tuning is
costly. If there are any patterns embedded in the data, NN come up with a result with
minimum errors. Other statistical methods have the advantage of providing relatively
inexpensive statistical forecasting models that only require historical data. However, the
accuracy of prediction of these models drops significantly when the time horizon is
extended, when the trends are not linear, or in the presence of some exogenous factors.
Two main groups of forecasting models were considered: Classical Forecasting
methods and Neural Networks. These were compared and show the relevance of the
NN-based group for an open logistics system in the context of PI.
4
factors. Furthermore, they have been widely used and implemented in real cases. For
example, the authors in (Carbonneau, Laframboise, and Vahidov 2008) implemented
SVR and MLR as benchmark models with a Recurrent Neural Network for demand
prediction of foundry data in Canada. The authors in (Aburto and Weber 2007; Ryu,
Noh, and Kim 2016) implemented an ARIMA model as a benchmark with a neural
network model to train and predict customer demand.
The mathematical formulation and some additional applications for ARIMAX,
SVR, MLR, and LSTM can be found in Appendix 1.
5
forecasting performance with the Relative Mean Square Error and found the
performance was better compared to the RBF neural network.
The Random Walk and Exponential Triple Smoothing (ETS) models are
interesting benchmarks for recurrent neural networks. However, some constraints make
these two models incompatible with this experiment. Firstly, the ETS (Taylor 2010; A.
2016) and the Random Walk (Tyree and Long 1995; Nag and Mitra 2002) models are
fitted with a univariate input factor and the inputs in this experiment are multivariate:
unit price and historical daily demand. Secondly, the Random Walk model only
considers the last observed values, whereas LSTM considers the variants of time lags in
the prediction (A. 2016).
6
the main advantage of learning the patterns in the data and the relationship between
inputs and outputs using a non-statistical approach. NN-based approaches in forecasting
do not require any predefined mathematical models. They try to capture, memorize, and
use the inner patterns or relationships to make predictions.
NNs mimic how biological neurons operate, communicate, and learn. A NN is
made of several layers of interconnected neurons. A specific learning algorithm governs
the learning process. This training process changes the weights across the network until
the network is identified as an optimal model that explains the patterns and links
between the variables.
NN models are one of the most popular models for forecasting non-linear
behaviour in Supply Chains (Carbonneau, Laframboise, and Vahidov 2008). More
particularly, Recurrent Neural Networks exhibit good performances with complex
forecasting problems such as financial data, production capacity, retailer transactions, or
any complex time-series data. Long Short-Term Memory (LSTM) is one of the highest
performing recurrent neural network models. In LSTM, the concept of a memory cell
(Greff et al. 2017; Sagheer and Kotb 2019) is used to build the neural network structure.
Long Short-Term Memory (LSTM) Neural Networks, or more explicitly recurrent neural
networks (RNN) with short-term and long-term memory, are the most successful RNN
architectures. They have enjoyed enormous popularity in many applications and
domains, including forecasting problems.
Both LSTM and RNN are fundamentally different from traditional direct-acting
neural networks. They are formed by backpropagation through time (BPTT) (P.J.
Werbos 1990). These sequence-based models can establish temporal correlations
between the previous information and the current circumstances. This characteristic is
ideal for demand forecasting problems, as the effects of past demand and historical
values of exogenous factors on future demand can be modelled. Indeed, in a supply
chain, demand not only depends on past values but also on the present and past values
of other factors in the chain.
Much research has implemented recurrent neural networks, especially LSTM
models, for predictions with time-series data. Navya (2011) proposed an Artificial
Neural Network to forecast the future trading volume of agricultural commodities. In
terms of accuracy and inequality, the method outperformed the MLR and ARIMA
approaches. Sagheer and Kotb (2019) proposed an LSTM recurrent neural network to
7
forecast the future production rate of petroleum products. In (Kantasa-ard et al. 2019),
LSTM also outperformed other approaches in predicting white sugar consumption in
Thailand.
As stated before, few studies deal with the forecasting problem in the context of
the physical internet, especially using NN techniques. The authors in (Qiao, Pan, and
Ballot 2019), for example, proposed a dynamic pricing model based on forecasting the
quantity of transported requests in the next auction periods. The objective was to
maximize the total profit of the transportation rounds. In a previous study (Kantasa-ard
et al. 2019), LSTM was used to predict white sugar consumption in Thailand in the
context of a PI network.
The literature is full of studies on forecasting techniques, mainly quantitative
methods. Of these methods, the most important in classical regression are MLR,
ARIMAX, and SVR. Of the NN-based methods, LSTM performs best (Sagheer and
Kotb 2019; Chen, Zhou, and Dai 2015). Table 1 summarizes the characteristics of these
models. The first column provides the Model name, followed by its group in the second
column. The third column recaps the model characteristics. The last three columns
provide a comparison of the models according to the most commonly encountered
criteria in the literature (Cao, Li, and Li 2019; Aburto and Weber 2007; Carbonneau,
Laframboise, and Vahidov 2008): performance with complex data, training period, and
performance with a non-linear trend. Performance with complex data concerns the
accuracy as well as the ability of the model to handle many factors. The training period
relates to the computational time during the training phase. The performance with a
non-linear trend shows how a model can capture the patterns in the data, especially non-
linear relations. The number of “+” in Table 1 shows the quality of each indicator.
These three indicators are highlighted because of the characteristics of the agricultural
datasets used in our experiments.
8
Forecasting Model Characteristics Complex Training Non-
Model Group data period linear
trend
9
2.2 Metaheuristic methods for Neural Network parameter tuning
Trial and error is most commonly used for hyperparameter tuning in forecasting
models. However, it takes longer to find an appropriate set of parameters for the model.
Furthermore, there is no guarantee that the solution will be better (Kim and Shin 2007).
Metaheuristic methods are an interesting way of reducing the time spent on
hyperparameter tuning. Ojha and his research team, for instance, proposed that some
metaheuristics such as genetic algorithms, particle swarm optimization, and ant colony
optimization are good exploitation and exploration tools for tuning hyperparameters in
feed-forward neural networks (Ojha, Abraham, and Snášel 2017). However, no single
method can handle all tuning problems perfectly. Therefore, the hybrid metaheuristic
solution was put forward to improve the performance of the tuning phase. Indeed, the
tuning problem is complex for NN in general and more specifically RNN. There are
many behaviors to be extracted and collaboration between two or more heuristics should
be beneficial. In the following, the focus is on two metaheuristics: Genetic Algorithm
and Scatter Search.
Appendix 2 provides more details on the principle of the Genetic Algorithm and
Scatter Search.
10
instance, are chosen randomly from the hyperparameter dictionary. As the network
parameters are generated from similar components in the dictionary, premature
convergence or local minima can occur before reaching the best solution (Dib et al.
2017). Therefore, constructing a hybrid method would be a great choice to increase the
performance of the network structure and prevent premature convergence. To do that,
Scatter search is a promising heuristic method and is described in the next section.
2.2.2 Scatter Search
Scatter Search (SS) is another metaheuristic method for constructing new
solutions based on the integration of existing or reference solutions (Laguna and Marti
2003). The purpose is to improve the performance of the solutions generated with the
various elements in the solution space.
Many studies propose this heuristic to improve their NN. Laguna and Martí
(2006), for example, implemented the concept of Scatter Search to train a single hidden
layer of a feed-forward neural network. They also compared the performance of the
Scatter Search with the classical backpropagation and extended Tabu Search methods
for around 15 instances. The results show that Scatter Search performs better with a
higher number of instances. Cuéllar, Delgado, and Pegalajar (2007) benchmarked their
hybrid training method of a recurrent neural network against a scatter search. Their
method produced the same good results as the scatter search.
The potential of scatter search was exploited in this paper to build a hybrid
metaheuristic with a genetic algorithm for tuning the hyperparameters of the LSTM. In
this perspective, the problem statements and assumptions of this research are proposed
in the next section. In addition, the results of implementing Scatter Search and a Genetic
Algorithm are presented in the results and analysis section (section 5.1).
11
distributors and the retailers are based on each city or sub-region. The situation recently
showed that it is not practical to balance customer demand and stock levels at the
distributors in the region. The research question is how to balance customer demand and
stock levels between fully connected distribution centres and retailers in the supply
chain.
The concept of PI has never been implemented in the context of the agricultural
product supply chain in Thailand. Therefore, the quantity of commodity crops is
required to anticipate enough to serve retailers in the region based on the proposed
forecasting model. Furthermore, the distribution flow of demand forecasting with
agricultural products was simulated by implementing the concept of PI. The forecasting
details and the simulation model are described in the methodology section.
The experimental data were obtained from the Thai Office of Agriculture for the
period from January 2010 to December 2017 (OAE Thailand 2019). There were two
main assumptions for customer demand in this experiment.
Firstly, daily demand was generated randomly from the monthly quantity of
commodity crops: pineapple, cassava, corn.
Secondly, the total daily demand generated was equal to the monthly
quantity of commodity crops based on an equal probability each day.
Customer demand, in this experiment, included all retailers in the northern
region.
For the idea of distribution flow, in the example network in a PI context
presented in figure 1, it was assumed there were one production line, three PI-hubs, and
two retailers in the lower northern region of Thailand. All the components (production
line, PI-hubs, retailers) are interconnected.
12
Figure 1. Example of a distribution network in the context of the Physical Internet in the
lower northern region of Thailand.
4. Methodology
In this section, Figure 2 provides an overview of the proposed approach based on the
aforementioned problem statements and assumptions. Three items can be distinguished:
- Firstly, an appropriate forecasting model (item #1) was investigated taking
into account fluctuation in demand. The concept of the forecasting model is
to plan sufficient resources for all the parties in the chain. An LSTM
Recurrent Neural Network was considered and was implemented using
Python language with Keras and Sci-kit libraries.
- Secondly, automated tuning of the relevant parameters was proposed to
improve the performance of the forecasting model (item #2). The hybrid
metaheuristic used was constructed using a combination of a Genetic
Algorithm (GA) and Scatter Search, which replaced the GA Mutation
process.
- Thirdly, a simulation of a Physical Internet network using forecasting data
was conducted to investigate how to plan resources in a complex chain (item
#3) and to assess the effectiveness of the forecast data on reducing holding
and transportation costs. The simulation was performed using the NetLogo
multi-agent platform (Nouiri, Bekrar, and Trentesaux 2018).
13
Figure 2. Research structure flow chart
The details of the proposal are investigated in the following section. As per
figure 2, this section details successively the forecasting model, parameter tuning using
hybrid metaheuristics, and the simulation of the Physical Internet network.
15
√
( X i−Y i )2
n
RMSE= ∑ (1)
i n
These scores measure the accuracy and the goodness of fit between the real and
the predicted values (Shafiullah et al. 2008; Bala 2010; Acar and Gardner 2012). If
these scores are smaller, the deviation between the real and the predicted values is
smaller too. R-Squared (R2), another evaluation factor, measures the degree of
association between two variables in such a model (Cao, Li, and Li 2019). In this case,
the variables are the real and predicted values (see equation (6)).
n
∑ ( X ¿ ¿ i−¿ Y i)2
i
R2 = 1- ( n
¿ ¿ )2
∑ ( X i−X i)
i
(6)
The unit root score obtained using the ADF (Augmented-Dickey Fuller test)
determines if the predicted data is stationary or non-stationary. The null hypothesis of
ADF is H 0 : ρ=1 , which means the sequence is non-stationary if root ρ is equal to one.
Therefore, to reject the null hypothesis or make the data stationary, the root ρ should be
less than one and the ADF score should be more negative.
After constructing the forecasting model, another important task to consider was
tuning the model parameters.
16
As described previously, a relevant process for tuning the hyperparameters of the LSTM
model (number of hidden layers, number of neural units in each layer, activation
function, and optimizer function) is needed to optimise its efficiency. It took a long time
to choose the appropriate parameters for each dataset and the loss value was still high.
Hyperparameters are generally chosen based on the trial-and-error solution,
which means trying all possible solutions to tune the hyperparameters in the forecasting
model structure.
Some studies propose metaheuristics to tune neural network parameters, as
mentioned in the literature review. The principle of implementing a hybrid
metaheuristic is shown in figure 3 and the details are outlined below. In addition, the
section of the genetic algorithm was inspired by (Harvey 2017). The input data
(historical daily demand and unit price of each product) and the predicted outputs (daily
demand for the next period) were used to choose the hyperparameters.
Firstly, the algorithm starts the solution encoding by randomly generating the
population of LSTM hyperparameter network structures. In this case, four
hyperparameters were considered to construct the network structure: the number of
hidden layers, the number of neural units in each layer, activation functions, and
optimizer functions. These hyperparameters are the main parameters affecting the
performance of the forecasting model. Once the set of hyperparameter networks has
been generated, all the networks are trained and the algorithm returns a fitness score,
which is the loss value for each network. The network structures are then displayed in
descending order starting with the highest fitness score. The algorithm also checks
17
whether the process runs until the last network generation is reached or not. If the
generation is not the last one, the performance of all the networks will be improved
through the selection, crossover, and mutation processes of the genetic algorithm.
Details of the genetic algorithm are provided in figure 4.
(A)
(B)
18
Figure 4. Process overview of a Hybrid Genetic Algorithm and Scatter Search (A);
Example network structures in selection, crossover, and mutation (B)
19
As shown in figure 5, the simulation provides the daily variation in holding and
transportation costs.
Figure 5. Screenshot of the simulation model in the physical internet supply chain
The holding and transportation costs of real and forecast demand were
compared. A small deviation between the real and forecast results proves the
effectiveness of our proposed approach. An effective forecasting model leads to good
resource planning and, therefore, a decrease in supply chain costs. The same simulator
and the same configuration were used to simulate the predicted and real demand. The
configurations of the simulation model are detailed below.
Details of the configuration of the simulation model
As the PI concept is based on full connectivity between PI-hubs, a replenishment rule
needs to be chosen. In our simulation model, the replenishment policy was the same in
both experiments. The closest hub was always selected as a good replenishment node to
fulfil retailer demand. There were three main assumptions for the simulation.
The order quantity of each retailer on each day was equal to daily demand.
Each distribution hub had its own trucks and managed them separately.
The stock levels at PI-hubs were sufficient for all orders (i.e. the initial stock
level at each hub was greater than the total predicted quantity).
In accordance with the assumptions in section 3, the predicted daily demand of
two retailers was used to calculate the transportation and holding costs for the predicted
daily demand in the simulation. After the delivery of retailer orders, the stock levels at
the hub were updated daily. The distance travelled by the trucks during delivery was
also updated. The holding and transportation costs were calculated and updated using
the equations below, where T is a daily period.
20
Holding cost:
T
total_holding_cost = ∑ (daily ¿ )
t =1
daily_holding_cost_hub = Inventory stock * 180
Transportation cost:
T
total_transportation_cost = ∑ (daily ¿ )
t =1
daily_Transportation_cost_truck = travelled_distance * Demand_Quantity *
1.85
The unit holding cost was equal to 180 THB or €5.20 per m 3 (based on the
Integrated Logistics Services Thailand 2019) and the unit transportation cost was equal
to 1.85 THB or €0.053 per km per ton (based on the Bureau of Standards and
Evaluation 2016). The simulation model was tested based on the predicted demand over
16 days and over 31 days. Then, the results (holding and transportation costs based on
predicted demand) were compared to the costs of real demand for the same period. The
main reason for focusing on 16 days and 31 days was to validate the deviation between
predicted and real demand based on different volumes of daily demand. The model
evaluation is described in section 5.3.
21
provides the lowest RMSE and MAPE scores with the training and test datasets.
Furthermore, the execution time was lower than the other tuning methods. The epoch
iteration was 500, which was taken from a previous study (Kantasa-ard et al. 2019).
22
R
e
a
A
l
L R
S M
S I
Dd VL
T M
e RR
MA
m
X
a
n
d
1 1 1 1 1
1 2 2 2 1
9 0 0 4 9
04 3 8 1 3
. . . . .
9 9 1 6 8
2 2 6 1 2
1 1 1 1 1
2 1 1 1 1
7 6 7 4 5
11 8 9 8 1
. . . . .
0 3 9 7 1
0 9 3 7 0
1 1 1 1 1
0 2 2 2 2
4 2 4 2 2
26 8 3 8 9
. . . . .
4 8 1 1 3
2 9 5 1 3
1 1 1 1 1
2 1 1 1 1
0 5 4 3 0
34 7 2 7 7
. . . . .
3 1 0 3 4
7 4 8 1 8
1 1 1 1
9
1 1 1 1
2
1 7 3 3
4
4 6 2 2 9
.
. . . .
5
2 9 3 3
0
5 8 9 9
51 1 1 1 1
2 0 0 0 0
23
8 7 6 3 0
5 9 2 8 6
. . . . .
4 1 0 2 8
3 2 6 5 1
1 1 1 1 1
1 1 1 1 1
3 5 8 3 5
67 1 7 9 3
. . . . .
6 0 9 5 6
7 9 9 1 5
1 1 1 1 1
3 0 1 2 1
6 9 7 0 6
70 8 0 8 8
. . . . .
3 6 7 8 7
3 3 7 0 9
1 1 1 1 1
2 3 2 2 2
5 9 5 5 6
80 0 4 1 7
. . . . .
5 8 6 8 2
0 1 3 9 6
1 1 1 1 1
2 2 2 3 2
7 0 4 0 6
99 8 9 1 6
. . . . .
5 2 9 3 1
5 5 9 9 5
1 1 1 1 1
2 2 2 2 2
7 3 3 6 5
18 7 9 1 0
. . . . .
6 2 6 3 3
2 6 4 1 1
1 1 1 1 1
2 2 2 2 2
2 2 4 7 5
13 6 6 4 8
. . . . .
9 4 6 7 9
5 9 4 2 1
1 1 1 1 1
4 1 2 2 2
0 7 1 4 2
10 3 7 7 2
. . . . .
1 5 7 8 9
8 3 5 2 3
11 1 1 1 1
24
1 4 2 3 3
6 0 9 0 2
5 2 9 6 0
. . . . .
6 0 1 8 6
8 3 9 9 1
1 1 1 1 1
3 1 2 2 2
2 4 1 8 2
14 1 6 2 3
. . . . .
8 3 8 0 4
0 9 7 6 3
1 1 1 1 1
1 3 2 2 2
8 4 4 4 5
14 0 6 5 2
. . . . .
0 3 0 5 9
8 0 0 0 5
(A)
(B)
Forecasting Data with time lag2 Data with time lag4 Data with time lag6
Model R2 Train R2 Test R2 Train R2 Test R2 Train R2 Test
SVR 0.93 0.91 0.93 0.91 0.93 0.91
MLR 0.92 0.9 0.92 0.91 0.92 0.91
ARIMAX 0.92 0.57 0.92 0.91 0.93 0.91
LSTM 0.94 0.9 0.93 0.91 0.93 0.92
25
(C)
Table 3. Examples of real and predicted daily demand with relevant forecasting models
for pineapple with time lag2 (A); Performance of the forecasting model for future
demand of pineapple (B)-(C)
(A)
(B)
Figure 6. Comparison of the trends in forecast and real demand using LSTM and SVR
models with time lag6 (A); ADF statistic score of LSTM demand forecasting with time
lag6 (B).
26
stationary based on the ADF score. This means that LSTM could work well with more
time-series data. Next, the experiments with other commodity crops are presented, as
shown in tables 4-5 and figures 7-8.
(A)
Data with time Data with time
Forecasting Model lag2 lag4 Data with time lag6
R2 Train R2 Test R2 Train R2 Test R2 Train R2 Test
SVR 0.95 0.7 0.95 0.83 0.96 0.86
MLR 0.96 0.95 0.96 0.95 0.96 0.96
ARIMAX 0.96 0.92 0.96 0.96 0.96 0.94
LSTM 0.96 0.95 0.96 0.95 0.96 0.95
(B)
Table 4. Performance of the forecasting model for future demand of cassava (A)-(B)
27
(A)
ADF statistic: -3.191
Confidence level Critical val.
95% -2.867
90% -2.57
(B)
Figure 7. Comparison of the trends in forecast and real demand using LSTM and
ARIMAX models with time lag4 (A); The ADF statistic score for LSTM demand
forecasting with time lag4 (B).
The performance evaluation in table 4 shows that LSTM performs well even
though Multiple Linear Regression (MLR) and ARIMAX were better in terms of
accuracy and degree of association between predicted and real demand. In this dataset,
the accuracy scores for LSTM were similar to those of the MLR model with time lag2,
whereas ARIMAX performed better with time lag4 and lag6. Regarding the degree of
association, the LSTM scores were very good with all the time lags compared to the
best scores obtained with the ARIMAX and MLR models. In addition, the predicted
demand with time lag4 was stationary based on the ADF score. The best performance of
the LSTM model was the prediction pattern with time lag4, as shown in figure 7.
28
Forecasting Data with time lag2
Model RMSE RMSE MAPE MAPE MAE MAE MASE MASE
Train Test Train Test Train Test Train Test
SVR 1873.48 2447.49 53.03 91.57 1066.26 1312.98 1.027 1.264
MLR 1912.62 2384.45 20.06 28.09 986.79 1069.35 0.950 1.030
ARIMAX 1901.65 2407.09 19.49 24.34 975.99 1062.43 0.940 1.023
LSTM 1912.48 2329.84 35.2 53.18 1017.05 1155.15 0.979 1.112
(A)
Forecasting Data with time lag2 Data with time lag4 Data with time lag6
Model R2 Train R2 Test R2 Train R2 Test R2 Train R2 Test
SVR 0.96 0.94 0.96 0.94 0.96 0.94
MLR 0.96 0.94 0.96 0.94 0.96 0.95
ARIMAX 0.96 0.94 0.96 0.91 0.96 0.94
LSTM 0.96 0.95 0.96 0.94 0.96 0.95
(B)
Table 5. Performance of the forecasting model for future demand of corn (A)-(B)
(A)
29
ADF statistic: -3.73
Confidence level Critical val.
95% -2.867
90% -2.57
(B)
Figure 8. Comparison of the trends in forecast and real demand using LSTM and MLR
models with time lag4 (A); The ADF statistic score for LSTM demand forecasting with
time lag6 (B).
The results of the performance evaluation are shown in Table 5. The RMSE and
R² scores demonstrate the good performance of the LSTM model for predicting demand
with time lag2 and lag6. The accuracy scores were better with the ARIMAX model with
time lag2 and the MLR model with time lag4 and lag6. In addition, the predicted
demand with time lag6 was stationary based on the ADF score. Moreover, the best
performance of the LSTM model was the prediction pattern with time lag6, as shown in
figure 8.
The overall performance of the forecasting models implemented for three
commodity crops with different dataset conditions is summarized. An overview is
shown in table 6 below.
30
Table 6. The best performances of the forecasting models for future demand of all
commodity crops and relevant conditions
Regarding the prediction characteristics, all the product graphs are seasonal.
However, the trends for each product are different; pineapple was non-linear whereas
the other products were more linear. For this reason, LSTM performed well with
predicting demand for pineapple, and the other classical models, MLR and ARIMAX,
were good for predicting demand for cassava and corn.
Once the forecasting process was finished, the prediction results obtained with
the LSTM forecasting model were used as inputs to calculate the total cost in the
simulation model of the Physical Internet, as mentioned in the next section. The main
perspective was to demonstrate the performance of the distribution flow after
implementing demand forecasting.
5.3 Total cost comparison in the simulation model of a global supply chain in the PI
context
Based on the assumptions of the simulation model stated previously, the simulation
model proposed by (Nouiri et al, 2018) was adapted to simulate the distribution flow in
the physical internet network inspired by the distribution centres in the northern region
of Thailand. In the original model (Nouiri, Bekrar, and Trentesaux 2018), demand was
randomly generated and the simulation was implemented to estimate the total
distribution cost.
The forecast demand for pineapple given by the LSTM model was compared
with the real demand via the multi-agent simulator. The holding and transportation costs
31
were used as KPI. These costs were also compared with those obtained when
considering real demand, as mentioned in table 7. The configuration details and
assumptions of the physical internet supply chain simulation are described in section 4.3
and the physical internet distribution flow is shown in figure 1 above.
The service level was based on sufficient stock levels to cope with daily demand
at each retailer. The holding costs and transportation costs are detailed in table 7.
Forecast Demand Real Demand
Day Total Holding Transportation Total Holding Transportatio
Demand Cost Cost Demand Cost n Cost
0 1203.92 152578.8 5933.68 1194.92 152647.2 5890.6
1 1168.39 143434.8 5759.1 1271.00 142700.4 6264.7
2 1228.89 133815.6 6058.39 1046.42 134510.4 5158.24
3 1157.14 124758 5704.68 1204.37 125085.6 5935.95
4 1116.25 116020.8 5502.89 924.50 117849.6 4557.39
5 1079.12 107575.2 5319.23 1285.43 107791.2 6335
6 1151.09 98568 5672.93 1137.67 98888.4 5607.18
7 1098.63 89917.2 5414.45 1360.33 88243.2 6704.58
8 1390.81 79088.4 6854.23 1250.50 78458.4 6162.68
9 1208.25 69631.2 5956.36 1279.55 68443.2 6307.8
10 1237.26 59947.2 6099.2 1278.62 58435.2 6303.26
11 1226.49 50349.6 6044.77 1223.95 48855.6 6033.44
12 1173.53 41166 5784.03 1400.18 37897.2 6901.84
13 1402.03 30193.2 6910.91 1165.68 28774.8 5745.49
14 1141.39 21261.6 5625.31 1324.80 18406.8 6530
15 1340.30 10771.2 6607.08 1184.08 9140.4 5836.19
Tota
19323.49 1329076.8 95247.24 19532.00 1316127.6 96274.34
l
(A)
Regarding the results in table 7, the small deviation of 0.98% and 1.02 % in the
holding cost and 1.07 % and 0.3% in the transportation cost over 16 days and 31 days,
32
respectively, means that the forecasting model is effective even if the data set is large.
These results could be useful to help companies plan the budget for storing and
transporting goods based on forecast demand.
6. Conclusion
There are three main contributions in this research. Firstly, the proposed LSTM model
performed well for demand forecasting compared with classical machine learning
methods, even though the ARIMAX and Multiple Linear Regression models performed
well for some products in terms of accuracy. In addition, the overall performance was
not hugely different from the classical forecasting models. The prediction capability of
LSTM was good with continuous fluctuation such as with the pineapple dataset,
whereas the classical forecasting models were reasonably good with discrete
fluctuation. In terms of the degree of association, LSTM captured the patterns of future
demand and real demand better than the other models based on the coefficient of
determination. Secondly, the use of a hybrid metaheuristic was proposed to automate
the tuning of the hyperparameters in the LSTM model. The accuracy and the
computational time were better than with the trial-and-error method. Finally, for the
total distribution cost in the Physical Internet simulation, the holding cost varied by
approximately one percent between forecast and real demand, and the transportation
cost varied from 0.3 to 1 percent. Therefore, the demand forecasting was effective and
led to good resourcing planning and optimization of the total supply chain cost in the
context of the Physical Internet.
For future research, it would be interesting to focus more on the hybrid methods
of forecasting models. For instance, researchers could implement the concept of the
LSTM model with other regression models to improve the prediction of customer
demand in the supply chain. Next, for tuning the hyperparameters, researchers could
also consider implementing other metaheuristics to increase the performance of the
network structure using demand from the forecasting models. Moreover, researchers
should consider alternative ways of improving routing, which will reduce further
distribution costs in the Physical Internet. In addition, when the number of hubs and
retailers is larger, they can implement the concept of dynamic clustering (Kantasa-Ard
et al. 2019) to cluster a group of distribution hubs and retailers before constructing
connected routes and planning the budget for the distribution process.
33
Acknowledgements
Regarding the successful results in this research work, we would like to thank an internship
student, Niama Boumzebra, for preparing the dataset to test the forecasting performance with
classical forecasting methods. We would also like to thank the Office of Agricultural
Economics Thailand for providing the initial dataset for generating data in this experiment.
Finally, we would like to thank Campus France and Burapha University for their sponsorship.
References
Aburto, Luis, and Richard Weber. 2007. “Improved Supply Chain Management Based
on Hybrid Demand Forecasts.” Applied Soft Computing Journal 7 (1): 136–44.
https://doi.org/10.1016/j.asoc.2005.06.001.
Acar, Yavuz, and Everette S. Gardner. 2012. “Forecasting Method Selection in a Global
Supply Chain.” International Journal of Forecasting 28 (4): 842–48.
https://doi.org/10.1016/j.ijforecast.2011.11.003.
Altiparmak, Fulya, Mitsuo Gen, Lin Lin, and Turan Paksoy. 2006. “A Genetic
Algorithm Approach for Multi-Objective Optimization of Supply Chain
Networks.” Computers and Industrial Engineering 51 (1): 196–215.
https://doi.org/10.1016/j.cie.2006.07.011.
Araújo, Teresa, Guilherme Aresta, Bernardo Almada-Lobo, Ana Maria Mendonça, and
Aurélio Campilho. 2017. “Improving Convolutional Neural Network Design via
Variable Neighborhood Search.” Lecture Notes in Computer Science (Including
Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics) 10317 LNCS: 371–79. https://doi.org/10.1007/978-3-319-59876-
5_41.
Bala, P.K. 2010. “Decision Tree Based Demand Forecasts for Improving Inventory
Performance.” IEEM2010 - IEEE International Conference on Industrial
34
Engineering and Engineering Management, 1926–30.
https://doi.org/10.1109/IEEM.2010.5674628.
Bouguila, Nizar, Djemel Ziou, and Jean Vaillancourt. 2003. “Novel Mixtures Based on
the Dirichlet Distribution: Application to Data and Image Classification.” In
International Workshop on Machine Learning and Data Mining in Pattern
Recognition, 172–81. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-
45065-3_15.
Box, G.E.P., and G.M. Jenkins. 1970. Time Series Analysis. Forecasting and Control.
[Reprint]. Holden Day,San Francisco.
Cao, Jian, Zhi Li, and Jian Li. 2019. “Financial Time Series Forecasting Model Based
on CEEMDAN and LSTM.” Physica A: Statistical Mechanics and Its Applications
519: 127–39. https://doi.org/10.1016/j.physa.2018.11.061.
Chen, Kai, Yi Zhou, and Fangyan Dai. 2015. “A LSTM-Based Method for Stock
Returns Prediction: A Case Study of China Stock Market.” Proceedings - 2015
IEEE International Conference on Big Data, IEEE Big Data 2015, 2823–24.
https://doi.org/10.1109/BigData.2015.7364089.
Cools, Mario, Moons Elke, and Wets Geert. 2009. “Investigating the Variability in
Daily Traffic Counts through Use of ARIMAX and SARIMAX Models: Assessing
35
the Effect of Holidays on Two Site Locations.” Transportation Research Record
2136 (1): 57–66. https://doi.org/https://doi.org/10.3141/2136-07.
Delhez, Éric J.M., and Éric Deleersnijder. 2008. “Age and the Time Lag Method.”
Continental Shelf Research 28 (8): 1057–67.
https://doi.org/10.1016/j.csr.2008.02.003.
Greff, Klaus, Rupesh K. Srivastava, Jan Koutnik, Bas R. Steunebrink, and Jurgen
Schmidhuber. 2017. “LSTM: A Search Space Odyssey.” IEEE Transactions on
Neural Networks and Learning Systems 28 (10): 2222–32.
https://doi.org/10.1109/TNNLS.2016.2582924.
Harvey, Matt. 2017. “Let’s Evolve a Neural Network with a Genetic Algorithm.”
Coastlineautomotion. 2017. https://blog.coast.ai/lets-evolve-a-neural-network-
with-a-genetic-algorithm-code-included-8809bece164?
Hochreiter, Sepp, and Jurgen Schmidhuber. 1997. “Long Short Term Memory. Neural
Computation.” Neural Computation 9 (8): 1735–80.
https://doi.org/10.3109/21695717.2013.794593.
Janvier-James, Assey Mbang. 2011. “A New Introduction to Supply Chains and Supply
Chain Management: Definitions and Theories Perspective.” International Business
Research 5 (1): 194–208. https://doi.org/10.5539/ibr.v5n1p194.
Kantasa-ard, Anirut, Abdelghani Bekrar, Abdessamad Ait el cadi, and Yves Sallez.
2019. “Artificial Intelligence for Forecasting in Supply Chain Management: A
Case Study of White Sugar Consumption Rate in Thailand.” In 9th IFAC
Conference on Manufacturing Modelling, Management and Control MIM
2019Berlin, Germany. Berlin: IFAC.
Kantasa-Ard, Anirut, Maroua Nouiri, Abdelghani Bekrar, Abdessamad Ait El Cadi, and
36
Yves Sallez. 2019. “Dynamic Clustering of PI-Hubs Based on Forecasting
Demand in Physical Internet Context.” In Studies in Computational Intelligence,
853:27–39. Springer Verlag. https://doi.org/10.1007/978-3-030-27477-1_3.
Kim, Hyun jung, and Kyung shik Shin. 2007. “A Hybrid Approach Based on Neural
Networks and Genetic Algorithms for Detecting Temporal Patterns in Stock
Markets.” Applied Soft Computing Journal 7 (2): 569–76.
https://doi.org/10.1016/j.asoc.2006.03.004.
Laguna, Manuel, and Rafael Marti. 2003. “Scatter Search Methodology and
Implementations in C.” In Operations Research/Computer Science Interfaces
Series. Boston,MA: Kluwer Academic Publishers.
Montreuil, Benoit, Russell D. Meller, and Eric Ballot. 2013. Physical Internet
Foundations. Studies in Computational Intelligence. Vol. 472. IFAC.
https://doi.org/10.1007/978-3-642-35852-4_10.
Nag, Ashok K., and Amit Mitra. 2002. “Forecasting Daily Foreign Exchange Rates
Using Genetically Optimized Neural Networks.” Journal of Forecasting 21 (7):
501–11. https://doi.org/10.1002/for.838.
Nouiri, Maroua, Abdelghani Bekrar, and Damien Trentesaux. 2018. “Inventory Control
under Possible Delivery Perturbations in Physical Internet Supply Chain Network.”
In 5th International Physical Internet Conference, 219–31. Groningen.
Ojha, Varun Kumar, Ajith Abraham, and Václav Snášel. 2017. “Metaheuristic Design
of Feedforward Neural Networks: A Review of Two Decades of Research.”
Engineering Applications of Artificial Intelligence 60 (April): 97–116.
https://doi.org/10.1016/j.engappai.2017.01.013.
37
P.J. Werbos. 1990. “Backpropagation Through Time: What It Does and How to Do It.”
In Proceedings of the IEEE, 78:1550–60.
http://ieeexplore.ieee.org/document/58337/?reload=true.
Qiao, Bin, Shenle Pan, and Eric Ballot. 2019. “Dynamic Pricing for Carriers in Physical
Internet with Peak Demand Forecasting.” IFAC-PapersOnLine 52 (13): 1663–68.
https://doi.org/10.1016/j.ifacol.2019.11.439.
Ryu, Seunghyoung, Jaekoo Noh, and Hongseok Kim. 2016. “Deep Neural Network
Based Demand Side Short Term Load Forecasting.” 2016 IEEE International
Conference on Smart Grid Communications, SmartGridComm 2016, 308–13.
https://doi.org/10.1109/SmartGridComm.2016.7778779.
Sagheer, Alaa, and Mostafa Kotb. 2019. “Time Series Forecasting of Petroleum
Production Using Deep LSTM Recurrent Networks.” Neurocomputing 323: 203–
13. https://doi.org/10.1016/j.neucom.2018.09.082.
Shafiullah, G. M., Adam Thompson, Peter J. Wolfs, and Shawkat Ali. 2008. “Reduction
of Power Consumption in Sensor Network Applications Using Machine Learning
Techniques.” IEEE Region 10 Annual International Conference,
Proceedings/TENCON. https://doi.org/10.1109/TENCON.2008.4766574.
Supattana, Natsupanun. 2014. “Steel Price Index Forecasting Using ARIMA and
ARIMAX Model.” National Institute of Development Administration.
http://econ.nida.ac.th/index.php?
option=com_content&view=article&id=3021%3Aarima-arimax-steel-price-index-
forecasting-using-arima-and-arimax-model-mfe2557&catid=129%3Astudent-
independent-study&Itemid=207&lang=th.
Tyree, Eric W, and J A Long. 1995. “Forecasting Currency Exchange Rates : Neural
38
Networks and the Random Walk Model Forecasting Currency Exchange Rates :
Neural Networks and the Random Walk Model.” Proceedings of the Third
International Conference on Artificial Intelligence Applications.
Zhang, G. Peter, and Min Qi. 2005. “Neural Network Forecasting for Seasonal and
Trend Time Series.” European Journal of Operational Research 160 (2): 501–14.
https://doi.org/10.1016/j.ejor.2003.08.037.
Appendix 1
Autoregressive Integrated Moving Average with Exogenous factors (ARIMAX)
Mathematical formulation: The ARIMAX model combines the ARIMA model with
exogenous variables. It is compound of three parts: the autoregressive (AR) model, the
moving-average (MA) model, and a linear model of the exogenous part (EX). The used
notation ARIMAX ( p , q , d) refers to a model with p AR terms, q MA terms, and d EX
terms. One of the mathematical formulations of the ARIMAX model is given in
equation (7), where Y t is the value to predict at time period t , in our case the demand, ε t
is the error at time t , and X t is the vector value of the exogenous factors at time t . The
first monomial in this equation (at the left side of the equal sign) represents the AR
model, the second monomial (first after the equal sign) represents the MA model, and
the third monomial (second after the equal sign) represents the EX model. The
parameters of these model are receptively {φ1 , φ2 … , φ p } , {θ 1 , θ2 … , θq },and
{η1 , η2 … , ηd } and the operator L is the lag operator.
φ ( L ) Y t =θ ( L ) ε t + η ( L ) X t
p
φ ( L )=1−∑ φi Li
i =1
q
(7)
θ ( L )=1+ ∑ θ i L
i
with :
i=1
d
μ ( L ) = ∑ μ Li
i=1
Applications: For trend and causal models, ARIMA could be hybridized with other
technics; One application, for example, is the forecast of Monthly Steel Price Index
based on the historical data around 58 months in 2009-2014 by Supattana (2014) who
39
uses ARIMAX model; Crude oil price and iron ore price are exogenous factors in the
model and the experimentations showed that ARIMAX exhibits higher performance
regarding the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error
(MAPE) scores. In their ARIMAX models they consider the trend and seasonal of
dataset to capture their possible effect on daily traffic.
Applications: As part of a benchmark, Cao et al. (2019) also compared this model with
LSTM recurrent neural network to forecast the future stock market price The empirical
results shown that LSTM with adaptive noise proposed better performance when
comparing with SVR.
Y = β0 + βX + ε (9)
Applications: Ramanathan (2012) implemented MLR to predict the trend of soft drink
demand in the company case study in the UK for improving promotional sales
accurately.
40
Long Short-Term Memory (LSTM) Neural Network:
Mathematical formulation:
Figure 9. The structure of the LSTM block (Sagheer and Kotb 2019)
41
In Figure 9: X t is the input at time t and, generally, represent the exogenous
factors; The operator ⊕ symbolise the pointwise addition; The operator ⊗ symbolizes
the matrix product of Hadamard (product term to term); The σ and τ symbols
respectively represent the sigmoid function and the hyperbolic tangent function,
although other activation functions are possible. Firstly, the forget gate decides which
information must be leaved out from the gate. Secondly, the input gate decides which
information must be admitted to LSTM cell state. Next, the cell state value is updated.
Then, the output gate filters which information in the cell state should be produced as
output. After that, the value of hidden state is constructed.
Applications: Chen et al. (2015) implemented this method to predict the trend of China
stock market. The accuracy rate was so increased from 14 percent to 27 percent
comparing with Random Forecasting Method. In the same applicative field, Long et al.
(2019) compare the performance of their proposal (multi-filter neural network) with
those of LSTM for prediction of stock price movements. Also, Simoncini et al. (2018)
used it to classify the vehicle types with the Global Positioning System (GPS) data of
each vehicle.
Appendix 2
Genetic Algorithm
They use the concept of natural selection and apply it to a population of potential
solutions. It is based on the postulate of the existence of important processes within a
group of organisms of the same species which give rise to genetic mixing. These
processes occur during the reproductive phase when the chromosomes of two organisms
fuse to create a new better one. GAs imitate these operations in order to gradually
evolve the populations of solutions.
The main steps of GAs: (1) Selection: To determine which individuals are more
inclined to obtain the best results, a selection is made. This process is analogous to a
process of natural selection, the most adapted individuals win the competition of
reproduction while the least adapted die before reproduction. (2) Crossing or
recombination: During this operation, two individuals exchange parts of their DNA, to
give new one or new ones. (3) Mutations: Randomly, a gene can be substituted for
another. In the same way as for crossovers, a mutation rate is defined during population
changes. The mutation is used to avoid premature convergence of the algorithm.
42
In general, we start with a base population which most often generated
randomly. Each of the solutions is assigned a score that corresponds to its adaptation to
the problem. Then, a selection is made within this population. The algorithm will iterate
until a certain convergence is obtained or a stopping criterion is reached. GAs, in order
to allow problem solving, use the ingredients above and a representation of a solution.
This representation is called the solution's Coding, it has, also, an impact on the GA
performances. The convergence of GAs is rarely proven in practice. But the crossing
operator very often makes all the richness of the genetic algorithm compared to other
methods.
Scatter Search
Scatter Search derives from the strategies of combining decision rules and
constraints (Laguna and Martí 2006). For example, new rules are generated by weight
combination of existing rules. This algorithm is flexible and able to implement in many
problems based on the sophistication. The main ingredients for implementing scatter
search, generally, are:
1. A Diversification Generation Method to generate a random set of trial solutions,
2. An Improvement Method is applied to the trial solutions to create the enhanced one
(Neither the input nor the output solutions are required to be feasible).
3. A Reference Set Update Method to build and maintain a reference set consisting of
the “best” solutions found. Solutions gain membership to the reference set
according to their quality or their diversity.
4. A Subset Generation Method to operate on the reference set, to produce a subset of
its solutions as a basis for creating combined solutions.
5. A Solution Combination Method to transform a given subset of solutions produced
by the Subset Generation Method into one or more combined solutions.
Repeat the process (Elements 2 to 5) until the reference set does not change. Use
element 1, Diversification Generation Method, to diversify. Stop when reaching a
specified iteration limit or stopping criteria. The notion of “best” in step 3 is not limited
to a measure given exclusively by fitness function. In particular, a solution may be
added to the reference set if the diversity of the set improves.
43
44