0% found this document useful (0 votes)

42 views45 pages

Food Waste ML Thesis

This master's thesis examines using deep learning models to predict food waste in grocery stores. The author collected real-world waste data from a grocery store in Finland. Various deep learning models including MLP, CNN, LSTM and GRU were trained on the data to forecast upcoming food waste at the product level. The models' performance was evaluated using RMSE and a customized confusion matrix. The results demonstrated the ability of deep learning to predict future food waste in retail, with LSTM achieving the best results. The study provides a framework for applying machine learning to help reduce food waste.

Uploaded by

Hainsley Edwards

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views45 pages

Food Waste ML Thesis

Uploaded by

Hainsley Edwards

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Food Waste Prediction in Grocery Stores

Time Series Forecasting by Deep Learning

Weijing Shi

Master’s Thesis
Master of Engineering - Big Data Analytics
May 16, 2022
MASTER’S THESIS
Arcada University of Applied Sciences

Degree Programme: Master of Engineering - Big Data Analytics

Identification number: 8635

Author: Weijing Shi
Title: Food Waste Prediction in Grocery Stores
Time Series Forecasting by Deep Learning
Supervisor (Arcada): Amin Majd

Commissioned by:

Abstract:
Food waste has becoming an increasingly important problem globally. Source of waste
derives from supply chain, food manufacturing, household, retail stores etc. This the-
sis focuses on the food waste problem in retail industry and aiming to predict the
potential food waste in a grocery store by using deep learning approaches. With a
real world data-set from a grocery store in Finland, various deep learning models -
MLP, CNN, LSTM, GRU have been trained to forecast the upcoming food waste on
product level. The outcome of the experiments have been evaluated by means of cal-
culating the RMSE value as well as a business oriented confusion matrix. The study
has demonstrated the capability of the selected deep learning models on predicting the
future food waste in retail context.

Keywords: foodwaste, timeseries forecasting, deep learning

Number of pages: 44
Language: English
Date of acceptance: 1.10.2054
CONTENTS
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Food Waste at Retail Industry . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.2 Food Waste in Case Company . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Aim of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Ethical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Traditional statistical models . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Deep learning models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Multilayer perceptrons MLP . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Convolutional neural network CNN . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Recurrent neural network RNN . . . . . . . . . . . . . . . . . . . . . . 14

3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Data Exploring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Development environment . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 Customized confusion matrix . . . . . . . . . . . . . . . . . . . . . . . 28

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Customized Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Precision, Recall, F1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1 SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
FIGURES
Figure 1. deep learning architecture . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2. CNN architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 3. LSTM and GRU architecture . . . . . . . . . . . . . . . . . . . . . . . 15
Figure 4. Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Figure 5. Ready-to-eat Meal, 2022. . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 6. Waste Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Figure 7. Daily Waste Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 8. Day of the Week Distribution . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 9. Number of Features after Encoding . . . . . . . . . . . . . . . . . . . 23
Figure 10. Implementation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 11. Model - Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . 25
Figure 12. Model - Convolutional Neural Network . . . . . . . . . . . . . . . . . 25
Figure 13. Model - Long-short term Memory . . . . . . . . . . . . . . . . . . . . 26
Figure 14. Model - GRU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 15. Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Figure 16. Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 17. RMSE Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 18. RMSE Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 19. RMSE - Best Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Figure 20. Accuracy Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Figure 21. Accuracy Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Figure 22. Precision, Recall and F1 Distribution . . . . . . . . . . . . . . . . . . 39
Figure 23. Precision, Recall and F1 Key Info . . . . . . . . . . . . . . . . . . . . 40
TABLES
Table 1. Dataset Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Table 2. Confusion Matrix Definition . . . . . . . . . . . . . . . . . . . . . . 29
ABBREVIATIONS
FAO Food and Agriculture Organization of the United Nations
UNEP UN Environment Programme
ML Machine Learning
FNN Feedforward Neural Networks
CNN Convolutional Neural Networks
RNN Recurrent Neural Networks
LSTM Long Short Term Memory
DIF Demand Influence Factor
IQR Interquartile Range

5
FOREWORD
After over 12 years working experience in ERP consulting , I have decided to enhance
my skill sets in the data science area and started the Big Data Analytic master degree
studies in the middle of the Covid-19 pandemic. It is not easy to study a new field besides
the full time work, the intensive study as well as the thesis project required and consumed
abundant of time, focus, patience and determination. Thanks to the support of the teachers
in Arcada, my former colleagues in the case company of the thesis , my family and also
myself, finally I am finishing the writing of the thesis, and accomplishing a second master
degree in my life.

I am about to start a new career journey in data and business consulting tomorrow. It just
feels more than happy to write the Foreward part of the thesis at this moment.

Weijing Shi

Helsinki, 13.2.2022

6
1 INTRODUCTION
In the recent decades food wastage has becoming a rising public concern globally, in
the context where climate change is constantly worsening. According to (FAO 2011),
around 1/3 of the food in the world was estimated to be lost or wasted annually, and
food waste alone generates about 8 to 10 percentage of global greenhouse gas emissions
(UNEP 2021). As a result, the reduce of food waste can directly lead to the decrease
the greenhouse emission and thus help slowing down the global warming. Generally
speaking, food waste can be produced throughout the entire food value chain, mainly in
the following areas: agricultural production, post-harvest handling and storage, raw food
processing, distribution in wholesale and retail markets as well as individual household
consumption (FAO 2011). This thesis focuses on the food waste problem in retail industry,
specifically in the grocery stores. A real-life data-set from one of the biggest food stores
in Finland has been studied. The data-set consists of past two and half years wastage
history of about 1500 products in Ready-to-Eat Meals category, along with other relevant
features which could impact the wastage e.g. the daily sales number, the stock situation,
holidays and promotions etc. By means of trying various machine learning and deep
learning models on the given data, the object is to examine how well ML methods would
be able to predict the potential upcoming food waste for each product. The model trained
in this study with satisfying prediction result can be taken into use by the case company
as food waste prediction service in its production environment.

1.1 Background
The first chapter serves as background overview, where the overall food waste situation
at retail level as well as at case company level are being introduced.

1.1.1 Food Waste at Retail Industry

According to Stenmarck (2016) retail sector is believed to produce about 5% of the total
food waste in EU. Available data from the existing researches show that fruit and veg-
etables, dairy products, bread and fresh meat products are the most wasted products at
retail level (Felicitas Schneider 2020). There are several common reasons for food being
wasted or discarded in the grocery stores such as: expired shelf-life, package damages,

7
overstocking due to inaccurate demand prediction and so on. Even the wastage amount
is relatively low compare to other players in the food supply chain i.e. food waste in
household or production process, retailer can play an important role in reducing the waste
because of its unique position in the value chain. First of all, retailer has the capability
to sell the potential waste with their special pricing techniques, for instance the products
approaching best before dates with over half price discount are usually highly attractive
to many of the grocery consumers. Secondly retail giants with great procurement power
are capable of setting high standard of goods and services from its manufacturers and lo-
gistics partners, which could reduce the potential food waste being generated prior goods
arrive to stores due to bad logistic handling or manufacturing faults. Last but not the
least, from retailer’s own business perspective, aiming high in its operational excellence
for example improving the demand forecast accuracy can benefit the overstocking situa-
tion, so that goods will not be ordered too much than the actual need and thus avoid being
wasted.

In summary it can be concluded that retail industry does not produce as much food waste
as other players in the food value chain, but retailers can effectively influence the food
wastage situation with its unique role.

1.1.2 Food Waste in Case Company

In this section, the background of the case company together with its motivation and strat-
egy of dealing with food waste problem is introduced. The case company who provides
the data-set for this study is a leading Finnish grocery operator, who owns over 1200 food
stores all over Finland with 1.2 million daily customer visits (GROCERY TRADE 2021)
. The company is dedicated for sustainable development, and aims to be carbon neutral
in 2025 and zero emissions by 2030 (anualreport 2021). Food waste is calculated as car-
bon emission thus reducing of which has been considered as one of the concrete action
plans to be achieved in order to realize the sustainability goal. Based on the company’s
published annual reports, the company is making good progress in terms of food waste
reduction. In year 2016, the identified food waste relative to sales is calculated to be 13%
in the case company’s grocery chains, and the figure has been further reduced to 12% by
the end of 2020.

8
1.2 Aim of the project
The aim of this thesis is to experiment how well deep learning models such as Convolu-
tional Neural Network and Recurrent Neural Network can help to forecast the potential
upcoming food waste in a supermarket based on the historical data . It is believed that
the waste prediction in good quality can help the store to plan the actions in advance on
the products which are possibly being wasted, so as to largely reduced the waste being
generated.

1.3 Research Question

In order to reach the aim of the project described above, the research question is formed
as follows:

Predict the upcoming food waste in grocery store based on the historical
transactional data.

The research question is a typical time series forecasting problem. Deep learning or
artificial neural network are the main methods to tackle the problem in this study.

1.4 Limitations
The data-set is derived from one hypermarket in an industrialized country and the con-
cerned merchandise category is ready-to-eat meal. Therefore the presented results is lim-
ited to similar type of environment and context.

1.5 Ethical considerations

This thesis does not concern any sensitive personal data.

9
2 LITERATURE REVIEW
Time series forecasting is used widely in many applications such as stock price forecast-
ing, whether prediction, traffic forecasting and so on, in this study it is applied to the food
waste problem in a retail grocery store. In the following sections, the relevant theories are
reviewed, including machine learning, deep learning and their applications on the time
series problems.

2.1 Machine Learning

Machine learning is a branch of Artificial Intelligence(AI) which imitates the way how
humans learn (IBM 2021), to deal with the unseen data and future situation by creating
the models and algorithms utilizing the historical data. The biggest difference of ML and
transitional programming is that ML models do not have to be explicitly programmed
(Samuel 2000). Through times of iterations the accuracy improves gradually accordingly.
Two most famous machine learning algorithms are so called supervised learning and un-
supervised learning.

Supervised Learning Supervised Learning refers to such situation, where the algo-
rithms are trained under human’s overseeing. The original data set consists of the tagged
label along with the data features. For example an image itself contains the data features
such as colors and shapes and its label can be a dog or a cat. By training a machine learn-
ing model with thousands of such labeled images, a machine learning model could learn
how to classify dog or cat on a new image. Regression and Classification are the two
most well-known and popular problems using supervised learning. In case of a regression
problem the goal of the model is to predict the output in terms of numerical value e.g.
predict how much a stock price will be, while classification model is aiming to predict a
categorical value instead for instance to tell whether the stock price will go up or down
tomorrow.

Unsupervised Learning Unsupervised Learning means the algorithms are not super-
vised by humans as the training data is not labeled. Clustering and Association Analysis
are the common problems using unsupervised learning. The purpose of a Clustering prob-

10
lem is to group the data points into desired sizes without telling the model which specific
conditions to follow for the grouping. A typical use case of clustering is to create the
customer groups with similar shopping behavior based on the receipt data. Association
Analysis on the other hand is meant for discovering the relations between the variables. In
the context of grocery trade and huge amount of receipts as training data, associate anal-
ysis can help to find out what products are most often purchased by the consumers.

2.2 Time Series Forecasting

Time series can be defined as accumulating random variables in chronological order (Di-
nesh C.S. Bisht 2021) . Time series forecasting is the process of predicting the future
events by analyzing the past happenings, assuming the past trend will continue in the fu-
ture. The process includes modeling the historical data and fitting the model to the same
set of variables to get the future values. A time series consists of base, trend, season and
residual components. The base is long-term mean of the time series and the trend is long-
term movement of the mean value. Seasonal behavior meant for the cyclically repeated
changes and residuals are the stochastic components of a time-series data (Lars Kegel
& Lehner 2018). These factors together structure the models applied in the time-series
forecasting problems. In general, time series forecasting method involves two classes of
algorithms, they are:

• Linear models

• Non Linear models

Linear models are traditional statistical models such as AR, MA, ARIMA and SES, while
deep learning models are usually considered as non linear due to its widely usage of
activation functions and they are the main methods to be experimented in this study.

2.3 Traditional statistical models

Classic time series models have a longer tradition and rooted in statistics and mathematics.
They usually learn from past observations and therefore predict future values using solely
recent history, such as Autoregressive Integrated Moving Average (ARIMA), and Simple

11
Exponential Smoothing (SES). ARIMA is one of the popular and widely adopted time
series analysis methods, developed by Box and Jenkins (1976). ARIMA is derived from
AR(autoregressive) and MA(moving average) method’s, and is meant for fitting a class
of linear time series models. ARIMA model is proper for stationary time series data, SES
model is on the other hand appropriate for non-stationary data (i.e. data with a trend and
seasonal data). The limitation of the linear models are however that they require variables
to be independent with each other, and do not account for the latent dynamics existing in
the data (Selvin et al. 2017).

2.4 Deep learning models

In the recent few decades, deep learning models have been seen great success and many
research research papers have successfully applied deep learning methods. Deep Learning
is a subfield of machine learning concerned with algorithms inspired by the structure and
function of the brain called artificial neural networks, therefore deep learning models
are often referred to as deep neural networks. A typical neural network architecture is
illustrated in the Figure 1 (Bahi & Batouche 2018), which is consist of an input layer,one
or more hidden layers and an output layer. In each layer there are several nodes, or
neurons, and the nodes in each layer use the outputs of all nodes in the previous layer
as inputs, so that all neurons interconnect with each other through the different layers.
Each neuron typically is assigned a weight that is adjusted during the learning process
and decreases or increases in the weight change the strength of that neuron’s signal. The
commonly used neural networks types include:

• Multilayer perceptrons (MLP)

• Convolutional neural network (CNN)

• Recurrent neural network (RNN)

Like traditional machine learning models, deep learning can work on supervised problem
e.g. regression and classification, as well as unsupervised problem like clustering. In the
later paragraphs, we are reviewing the algorithms of these common deep learning models,
by means of which the food waste data-set has been experimented.

12
Figure 1. deep learning architecture
(Bahi & Batouche 2018)

2.4.1 Multilayer perceptrons MLP

Feedforward neural networks (FNN) allow signals to travel only in one direction, from
input to output. There are no cycles or loops in the network thus it is considered as the
simplest type of artificial neural netowork. Multilayer perceptrons(MLPs) is one special
type of FNN, where nodes from one layer are connected (using interconnections or links)
to all nodes in the adjacent layer(s) (Lek & Park 2008). Figure 1 is actually a MLP net-
work. The major use cases of MLP are pattern classification, recognition, prediction and
approximation (Abirami & Chitra 2020). MLP model can deal with non-linear problems
like other deep learning models, however according to scikit learn documentation it has
some disadvantages such as different validation accuracy per different random weight ini-
tialization, effort required of hyperparameters finetunning and sensitive to feature scaling.
MLP is used as baseline model in this study.

2.4.2 Convolutional neural network CNN

CNN is another type of feedforward neural networks widely used in image recognition
and text classification. It came to be known since late 80s and transformed the world
of computer vision and audio processing due to its unique capability of encoding spatial
relationships (Rivas 2020). A standard CNN architecture consists of several convolutional
layers, pooling layers, as well as fully connected layers as shown in Figure 2. Convolution

13
Figure 2. CNN architecture
(Blaji 2020)

is one type of matrix multiplication that are applied to the original input object or the
previous set of feature maps, in order to capture the relevant features. Pooling layers
are intended to reduce the number of computations by reducing the dimensionality of the
problem, most popular ones are e.g. AveragePooling and MaxPooling. Fully connected
layers are usually put before the classification output of a CNN and are used to flatten the
results before classification. CNN can also be applied in the time series problem due to
its ability of feature extraction, thus is also experimented in this project.

2.4.3 Recurrent neural network RNN

In contrast to feedforward neural networks, recurrent neural network RNN refers to such
artificial neural network where loops exist within the hidden layers. Thanks to such setup
RNN is able to use the information derived from previous step into the current task so that
the network can understand the sequences better. RNN models is valuable in handling
sequenced objects, thus is commonly applied in tasks such as speech recognition and
language translation. The cost of the RNN though is the additional parameters and com-
putations due to the weights associated with the input and previous output (Rivas 2020).
In addtion, traditional RNN’s in practise does not behave well in learning the long term
dependencies. As a result some advanced RNN models e.g. LSTM, GRU are developed
for improving the long-term memory. Figure 3 illustrates the architecture’s of standard
LSTM and GRU models. The details of each model are explained in the following para-
graphs.

14
Figure 3. LSTM and GRU architecture
(Phi 2018)

LSTM Long Short Term Memory networks – usually just called “LSTMs” – are a spe-
cial kind of RNN, capable of learning long-term dependencies. LSTM was introduced
by S. Hochreiter (1997) and meant for addressing the problems with traditional RNN’s,
including vanishing gradients, exploding gradients and inability to remember or forget
certain aspects of the input sequences (Rivas 2020). Three types of gates are the key com-
ponents in LSTM networks which makes it different with traditional RNNs, they are so
called forget gate, input gate and output gate as shown in Figure 2. The gates controls how
information is flowing through the cells, and can learn what information should be kept
or forgot during the process. These mechanism are trainable and optimized for each and
every single dataset of sequences (Rivas 2020). Therefore LSTM is particularly suitable
in dealing with sequenced data e.g. text, speech and general time-series data.

GRU Gated Recurrent Unit (GRU) is another type of newer version of RNN, aiming to
improve the vanishing gradients problem associated with traditional RNN. The design of
GRU is similar to LSTM, which contains two type of gates, update gate and reset gate.
The update gate helps the model to determine how much of the past information to be
passed to the future and the rest gate on the other hand decides how much of the past to
be forgotten.

15
As explained, LSTM and GRU are both advanced versions of RNN and good in handling
sequenced data-set, therefore both models have been experimented to the food waste data.
In the later chapters, the experiments conducted in this study along with its result are
shared.

16
3 METHODS
So far the previous chapters have clarified the business problem to be addressed and the
theories behind the relevant deep learning models being experimented. In the Methods
part, the research methodology of the study is checked. The process chart in Figure 4
is created to illustrate the key components of the applied methods and the logical rela-
tionship between each other. In the upcoming paragraphs, we start by introducing the
data collection process, and continue with exploring the raw data to catch some general
insights. Pre-processing activities are then being explained on how to get the data ready
for feeding the selected deep learning models. Finally the evaluation approaches are de-
scribed on how the experiment results have being measured.

Figure 4. Research Methodology

3.1 Data
The data-set is prepared by the author from scratch in the purpose of this project. The
scope of the data set is decided together with the domain expert in the case company as
follows:

• Store Type: hypermarket

17
Figure 5. Ready-to-eat Meal, 2022.

• Product Group: ready-to-eat food

• Time Period: Jan 2019 - Sep 2021

The Store The case company has about 36.8% market share in Finnish food trade mar-
ket in year 2021. The store selected for the study belongs to the hypermarket chain of the
case company, which combines a department store and a grocery supermarket. In year
2020 there were 81 such stores all around Finland. The store in question is located in one
of the most popular shopping center in the Helsinki Capital area. It offers a wide variety
of assortment that can fulfill most of the households’ daily consumption need.

Ready-to-eat Food The products included in the data-set belong to the ready-to-eat
meals category. Such kind of food usually have been cooked or prepared in advance
and can being eaten directly, for example the individual packed salad, soup and wok as
shown in the Figure 5. Ready-to-eat food has some common characteristics i.e. easy

18
Table 1. Dataset Structure

Data Template
Product Date Stock Sales Waste GdsRceipt DIF ID DIF Grp WkDay
10002000 01.01.2019 5 10 1 6 ABC X0 2
10002000 02.01.2019 6 8 0 0 BCD X1 3
... ... ... ... ... ... ... ... ...

to use, convenient package, storage in cool temperature, and relatively short shelf life,
which lead to the fact that food in this category is easier to become waste compared to
e.g. processed food like biscuit or tuna can.

Time Period The time series in question are between Jan 2019 and Sep 2021. As we
known since 2020 spring when Covid-19 pandemic started, since then consumers’ grocery
shopping behavior has largely changed and also reflected in the demand of the readymade
food. The data-set has collected days before and after the start of pandemic so that we
shall be able to see how well the models are capable of dealing with such consumption
change due to external demand influence factors.

Based on the scope of the data described above, the data collection process has started
accordingly.

3.1.1 Data Collection

The data collection task begins with identifying the relevant features. The factors which
might cause or effect the food waste in store are considered to be relevant and included
to the data collection process. According to such principle the features are defined as
follows: actual waste amount, stock balance, daily sales, incoming replenishment, and
demand influence factors i.e. holidays, promotions and so on. The next step is to locate
and extract these data from the Information Technology landscape of the case company.
After rounds of the interviews and discussions with the experts in the relevant depart-
ments, most of the concerned data is found in the business data warehouse system and the
forecast and replenishment system. It has then taken several days to download the data
from different sources, and merged into one big CSV file in the format as shown in Table
1.

19
Figure 6. Waste Frequency

As can be seen from Table 1, the primary key of the data-set consists of product ID and
the date, which means each row is meant for one particular product on one particular day.
The first four features are in numerical values, they are the stock balance at the end of
the day, and the total quantity sold or wasted or received on that day. The fifth and sixth
features are related to Demand Influcence Factor (DIF). DIF ID refers to the identification
of the DIF for example DAD is meant for Father’s Day and MOM for Mother’s day, while
DIF Group combines similar type of DIF ID together. With the previous example both
DAD and MOM are in same DIF Group e.g. H01. The last feature is week of the day,
aiming to find the cyclical patterns in weekly basis. Next we will explore the content of
the data-set for some general insight.

3.1.2 Data Exploring

There are initially over 1500 products in the raw data, which is compiled with the criteria:
1) target store, 2) target product group, 3) has consumption history since Jan 2019. Since
time series prediction is in general under such assumption that the future activities would
follow the similar way of working as of what has happened before, the deep learning
models require such training data which has enough waste record. As a result the initial
data-set has been further cleaned by filtering out the products without enough historical
data, and 45 products eventually remain in the final data-set which have over 800 days
valid sales and stock data, as well as at least 50 days positive waste history.

20
Figure 7. Daily Waste Overview

Unbalanced Data-set Let’s first have a look how frequent waste is happening among
these 45 ready-to-eat products. As can be seen from Figure 6, for most of the products, the
number of total days with waste record is less than 3 months concerning the 33 months in
study scope. The median value is around 70 days and the most frequently wasted product
has waste record in 167 days during the study period. We can also conclude that the given
data-set is imbalanced, since the number of days with waste is much less than without
waste.

Cycle and Season Secondly we will check the cyclical or seasonal behavior of the
given time series data. Figure 7 illustrates the daily aggregated waste quantity of all the
products in question over the past years. A weekly cycle can be easily identified from the
chart that Friday often reaches the peak of the food waste of that week and meanwhile
Sunday is usually the troughs. This finding well reflects the labour shift schedule of the
waste inspection and disposal activities in the store. There is though no obvious seasonal
movement can be found in the past two and half years. Figure 8 provides another view
by means of aggregating the numerical features on each day of the week from Monday
to Sunday, which confirmed the previous finding that waste is normally happening on the
working days. On the other hand, we can also see that sales and stock balance do not have
obvious cyclical pattern, while the target store is usually receiving the replenishment of
the concerned product group on Monday, Wednesday and Friday.

3.1.3 Data Pre-processing

Data pre-processing refers to the technique of preparing the raw data to make it suitable
for feeding a Machine Learning models for the training purposes. The pre-processing
approaches used in this study include: data cleaning, encoding and normalization.

21
Figure 8. Day of the Week Distribution

Data Cleaning In the data cleaning step, the products without enough data points have
been removed from the raw data. Here we define enough data in the way that 1) a product
must have more than 800 days valid stock balance data, 2) a product must have more than
800 days valid sales data, and 3) a product must have more than 50 days waste record.
The number of the products are thus being reduced from 1500 to 45.

In addition, the missing values have also been handled at this phase. The missing values
are mostly found in the numerical features. According to the data source and the meaning
of each feature, they have been processed so that missing sales and stock value would use
the previous day’s corresponding figure, and missing waste and replenishment would be
having value zero.

Data Encoding In case of data encoding, the categorical features need to be converted
to numerical values in order to be recognized by the ML models. Feature DIF ID and DIF
Group are the categorical features to be converted in our data-set, and Pandas get_dummies
method has been used for performing the encoding conversion. Figure 9 provides an
overview on number of the features every product would have after encoding. Most prod-

22
Figure 9. Number of Features after Encoding

ucts are having more than 30 features, indicating that DIF related features are largely
enriching the original data-set on top of the four basic numerical features.

Data Normalization The purpose of data normalization is to increase the accuracy of

the model, because normalization gives equal weights to each variable so that single vari-
able in bigger numbers would not influence the model performance too much in one
direction. In our experiments , z-score normalization (as known as standardization) has
been applied to normalize the data. In practical, it means each variable would minus its
mean value and divided by the standard deviation.

After the previous steps, the data preparation has completed and the data is ready for
experimenting with the deep learning models.

3.2 Experiments
The experiments would be described from three perspectives: 1) the development envi-
ronment where the experiments were performing, 2) the detailed implementation process
and 3) the principles for evaluating the models.

3.2.1 Development environment

The development tools used in this study are listed as follows:

23
Figure 10. Implementation Pipeline

• Python programming environment 3.7.12

• Numpy, Pandas data science libraries for Python

• Scikit-learn machine learning library for Python

• Keras deep learning API for Python

• Matplotlib, plotly libraries for Data Visualization

• SAP Business Warehouse for data collection

• SAP Forecast and Replenishment for data collection

All machine learning experiments has been performed in Google Colaboratory.

3.2.2 Implementation
The implementation took place at the product level and each product would be trained
by the five deep learning models. The implementation pipeline is illustrated in Figure
10. There are total 45 products in the data-set. Each product has been conducted with
preprocessing activities and divided into two feature sets: feature set A includes DIF ID
and feature set B include DIF Group, other features are exactly the same. Both feature sets
are then split into 80% and 20% over the past 2 and half years time span for the training
and the testing purposes, and feed to the deep learning models: MLP, CNN, LSTM and

24
Figure 11. Model - Multilayer Perceptron

Figure 12. Model - Convolutional Neural Network

GRU. Next we will go through the details of each model one by one.

Multilayer Perceptron - MLP MLP is used as baseline model due to its simple archi-
tecture. The model consists of two fully connected layers as shown in Figure 11, other
parameters include activation ReLu and Dense 100 for the first layer, as well as learning
rate 0.0003, optimizer Adam and epoches 60.

Convolutional Neural Network - CNN CNN is known for its powerful feature ex-
traction capability by means of its Convolutional layer and Pooling layer. The detailed
structure of the CNN model in this experiment can be found in figure 12.

25
Figure 13. Model - Long-short term Memory

LSTM LSTM is one advanced RNN which is good at dealing with time series problems.
In our experiment, sliding window method is used to form the input and feed the model.
Both window size 7 and 14 have been tried and dropout layer is added to generalize the
calculation. The LSTM model structure is shown in Figure 13.

GRU Last but not the least, another advanced RNN model GRU is tested out. Two
GRU layers and two dropout layers have been applied and followed by a dense layer at
the end.(Figure 14)

As for now each product has been tried out with the models described above, now it is
time to evaluate how these models have performed.

3.3 Evaluation
In this study the performance evaluation of the deep learning models has been conducted
from two perspectives. Firstly, we have reviewed the Root Mean Square Error (RMSE)
value of each experiment at the product and model level, aiming to measure the quality
of the estimator by means of the deviation of the predicted and actual value. Second type
of evaluation is taken care by a customized confusion matrix. The idea is to define a
business oriented criteria to classify the predicted numeric value into positive or negative
group and summarise the result in terms of a confusion matrix along with calculating the

26
Figure 14. Model - GRU

relative scores, based on which we shall be able to see the performance of each model on
the complete product list.

What is worth to mention is that during the implementation phase the time series type
of data-set has been converted into supervised learning mode in the way that the label
of the current day is shifted from the waste data of two days in future. The data-set has
then been split into 80 and 20 percent for the training and testing purpose, which is to
say 80 percent of the data has been used to train the model to predict what would be
waste quantity in two days, and 20 percent of the data for calculating those performance
indicators needed by the evaluations. More descriptions of the evaluation methods are
explained in the following paragraphs.

3.3.1 RMSE
Root Mean Squared Error, or RMSE for short, is a standard way to measure the error of a
model in predicting quantitative data. It is calculated as the square root of the mean of the
squares of the predicted and actual values’ deviations. The formula of RMSE is written
in below:

s
1 n
RMSE = ( ) ∑ (yi − xi )2 (1)
n i=1

27
In the formula above, xi denotes the actual value and yi represents the predicted value.
In our particular case, RMSE is calculated within each experiment trial by adding up the
deviation of every testing day’s predicted and actual waste, and dividing by the number
of the testing days. In this way we shall be able to get the average prediction variances of
the particular model on each product.

3.3.2 Customized confusion matrix

From the store operation’s perspective the workforce planning for the product waste in-
spection is arranged on the product level, which is to say the demand of the workload
is calculated according to the number of the concerning single product unit. Therefore
it would be more important to know what products might have potential waste to come,
rather than how many pieces of box would be wasted. With this logic confusion matrix
approach has been decided to use for prediction quality measurement, with the focus on
model’s ability to detect the waste than to estimate the exact waste quantity.

There are four indicators in a standard binary confusion matrix as shown in Figure 15
: True Positive (TP) - corrected predicted event values, True Negative (TN) - corrected
predicted non-event values, False Positive (FP) - wrongly predicted event values,and False
Negative (FN) - wrongly predicted non-event values.

The output of our deep learning models is waste value in numeric format as the problem
itself has been handled as a regression one. To convert a regression problem into a classi-
fication one, proper rules should be defined to categorize the numeric predicted value into
either positive or negative class. The confusion matrix used in the model evaluation of
this study has been defined as follows. In the operative circumstances the minimum waste
quantity is 1 in case at least one box is expired or damaged, so when the actual value is
equal or greater to 1, it is considered as positive. On the other hand, when classifying
the predicted value, it is at first being compared with certain threshold, if predicted value
is greater than the threshold it is classed as positive meaning that the model predicts the
food waste will happen in two days. Such prediction is considered as correct if the actual
waste happens in at least once within the future three days. The complete definition of the
confusion matrix has been listed in Table 2.

28
Figure 15. Confusion Matrix

Table 2. Confusion Matrix Definition

Confusion Matrix Definition

Category Pred Value (Day X) Actual Value Actual Value Actual Value
(Day X-1) (Day X) (Day X+1)
TP >= threshold OR >=1 OR >=1 OR >=1
TN < threshold AND <1 AND <1 AND <1
FP >= threshold AND <1 AND <1 AND <1
FN < threshold OR >=1 OR >=1 OR >=1

29
As illustrated in Table 2, on Day X if the predicted value is greater than or equal to the
threshold, and the actual waste has happened either on Day X ,Day X+1, or Day X+2,
the prediction is classified as True Positive (TP). Under the same circumstance if none
waste happened on Day X, Day X+1, or Day X+1, the prediction is False Positive (FP).
In addition, on Day X if the predicted value is less than the threshold, and the actual waste
has happened either on Day X ,Day X+1, or Day X+2, the prediction is classified as False
Negative (FN). Otherwise False Positive(FP) would be marked if the predicted value is
less than the threshold, and the actual waste has not happened at all during these three
days. The intuitive behind the confusion matrix is that if the food waste would come in
the future 1 or 2 or 3 days and we are able to predict it correctly today, the prediction
can be considered as valuable because business would get at least 1 day to plan for the
potential waste in advance.

Apart from calculating the number of true or false classification, the following scores
associated with confusion matrix have also been calculated for each product and model
combination. They are accuracy, recall, precision and F1 score.

Accuracy Accuracy represents the number of correctly classified data instances over
the total number of data instances. It can be calculated in the formula below.

TP+TN
Accuracy = (2)
T P + T N + FP + FN

Accuracy might not be the best measure when the data-set is imbalanced. As stated in
3.1.2, our data-set is imbalanced because most products have much more negative values
than positive. In such scenario even the model failed to predict the positive value, the
accuracy score can still be high.

Precision Precision also called positive predictive value, is defined as the ratio of correct
positive predictions to the total predicted positives. Its calculation is also expressed in the
following formula.
TP
Precision = (3)
T P + FP

30
Precision is an appropriate performance indicator when minimizing false positives is the
focus.

Recall Recall, as known as sensitivity or true positive rate, is defined as the ratio of
correct positive predictions to the total positives examples.

TP
Recall = (4)
T P + FN

Recall is more appropriate when minimizing false negatives is the focus.

F1 Score F1-score is a metric which takes into account both precision and recall and is
defined as follows:

2 ∗ Precision ∗ Recall 2∗TP

F1 = = (5)
Precision + Recall 2 ∗ T P + FP + FN

F1-score has combined both precision and recall, thus is a better measure than accuracy
especially for imbalanced data-set.

Due to different consumption and replenishment pattern, it is not likely that one model
would behave the best on all the products. Some products might work better with model
A, and some performs better with model B. With the evaluation approaches defined so far,
we shall be able to evaluate the quality of each experiment trial as well as overall behavior
of each model.

31
4 RESULTS
In this section, we will go through the results of the experiments according to the evalua-
tion methods defined in the previous chapter.

The outcome of totally 1350 experiments have been recorded which concerns:

• 45 Products

• 2 Feature Sets

• 5 Deep Learning Models

• 3 Thresholds

The results have been documented in a two dimensional table as is illustrated in Figure
16. As shown, each row is associated with one particular trial on the product and model

Figure 16. Experiment Result

bases, which can be identified by the first two columns: the ’article’ column contains the
product code used in the case company’s ERP system, and the ’model’ column indicates
the corresponding ML model name. ’feature_set’ indicates either DIF ID or DIF Group is
included to the feature selection and ’rate’ column tells the threshold used in the confusion
matrix classification. Column ’val_loss’ and ’train_rmse’ stored the RSME value of the
training and testing data, and the confusion matrix related indicator: ’TP’,’TN’,’FP’,’FN’
are also being counted and saved. In addition the number of the features and the size

32
of the data points are also included to the result table. More implications behind the
numbers are going to be explored as follows. We will use similar approach to go through
the outcome of the evaluations, in terms of checking the value distribution via box plot
together with the key statistic summary, aiming to find out the overall performance of the
deep learning models on the target data-set, as well as the similarity and difference among
the models.

4.1 RMSE
Root Mean Squared Error, as known as RMSE provides a straight forward measure of the
difference between the predicted and actual value. The less the RMSE value is, the better
the estimator works. RMSE becomes zero when the predicted value is exactly the same
as the actual value.

In Figure 17, boxplots are used to provide an overview of the RMSE value distribution
achieved by the five models on the 45 products, and color code is used to mark each
Feature Set. As seen, the shape of the boxplot is quite similar in all models, where the
majority of the products have RMSE distributed between 0 and 1, and the outlier’s are
only found beyond the max value of the boxes. Feature set DIF Group’s RSME value is
in general smaller than Feature set DIF ID for most of the products, because the Median
line and the IQR box of the former are mostly closer to zero than the latter. When do the
comparison across the models, GRU and MLP have got smaller RMSE median and IQR
value, which indicates the predictions made by these two models are more accurate for
most of the products than the rest models. On the other hand, LSTM models seem to have
wider IQR and couple outlier’s in extreme big value, which tells LSTM model has not
worked well in few specific products.

Figure 18 has provided the key statistic figures of RMSE value grouped by model name
and Feature Set as supplementary information. According to Figure 18, we can see the
median, average, max and min RMSE value of all the 45 products per each model, where
model GRU has been observed to outperform the others in terms of the lowest median
value (0.44) and mean value (0.52). MLP as the baseline model, surprisingly ranks at
the second place even with its simple architecture. The predictions made by CNN and

33
Figure 17. RMSE Distribution

Figure 18. RMSE Statistics

LSTM models on the other hand have turned out to deviate more with the actual values
comparing to their peers. The max RMSE achieved by LSTM model is found to be much
higher (7.71) than the rest models (between 1 and 2).

Figure 19 provided a third angle, showing the number of models who has achieved the
lowest RMSE value for each product. From this perspective GRU has again been per-
forming the best in 28 products out of 45, which is about 62% of the total products in
scope. Second is MLP which is working best for 7 products, and followed by LSTM
models and CNN, which respectively suits best for 5,4 and 1 product.

34
Figure 19. RMSE - Best Model

In short, the analysis of RMSE can be concluded that all five deep learning models have
shown a similar pattern of RMSE value distribution where the data range is between 0 and
1, Median and IQR skewed to zero, and outlier’s only exist beyond max value. Feature
set DIF Group has got smaller RMSE mean value than DIF ID for all the models, which
might suggest that generalized DIF can be easier for the models to learn the waste pattern.
Among the models studied, GRU model has performed the best from the perspective of
RMSE, based on the fact that GRU achieves smallest RMSE value in most of the products
in concern.

4.2 Customized Confusion Matrix

As the second view of the evaluation process, we will assess the models from the per-
spective of a customized confusion matrix. Feature Set DIF Group has performed better
than DIF ID according to the findings of RMSE analysis, thus Feature set DIF Group has
been used in the confusion matrix analysis. As explained earlier, each experiment trial
has been calculated with its own TP, TN, FP and FN as per three thresholds - 0.8, 0.9, 1,
based on which the Confusion matrix relevant scores Accuracy, Precision, Recall and F1
have been further calculated, and will be reviewed in this part.

4.2.1 Accuracy
Accuracy is calculated by means of dividing the total correctly predicted value (TP+TN)
by the total predicted value (TP+TN+FP+FN), therefore Accuracy score takes positive
and negative value into consideration with equal importance. The range of Accuracy
is between 0 and 1 and the best score can be up to 1 when the predictions are 100 %
correct.

The distribution of each models’ Accuracy score has been illustrated in Figure 20 where
the color indicates the specific rate being used. Many similarities are found to be shared

35
Figure 20. Accuracy Distribution

among all the models, for instance the greater value the rate is, the greater median and
IQR locates. In addition, the outlier’s only exist outside of the left whisker of the box
plots for some of the models, while there is no outlier’s at all beyond the right whiskers .
The median value of the Accuracy score is around 0.83 for all the models.

Aligned with the previous finding in RMSE analysis, when compare the performances
among the models, GRU again has outperformed the rest with regard to having higher
Accuracy score for most products. LSTM models are next to GRU which have relatively
high median and IQR value but also have few outlier’s with low value. Regarding MLP
and CNN models, most of the products have got lower Accuracy scores with them.

The statistics of the Accuracy score per each model and rate has been further listed in
Figure 21. As seen in Figure 21, Accuracy is generally higher in case of rate 1 than rate
0.8 and 0.9. The median value is between 0.81 and 0.87 and mean value between 0.78
and 0.85 across the various combinations of model and rate. Among the concerning five
models GRU has achieved highest median and mean Accuracy scores, for example its
median accuracy value has reached 0.87 and mean value 0.85. The median and mean
value of LSTM models are slightly lower than GRU, with mean and median value around
0.85. MLP and CNN has got bigger difference and got mean value near 0.81 and median
value 0.83.

36
Figure 21. Accuracy Statistics

37
In summary, the analysis of the Accuracy score has revealed that the Accuracy score
grows while the threshold rate increases, and most products have achieved Accuracy score
above 0.8. GRU and LSTM models behave better than CNN and MLP in terms of high
median and mean Accuracy value.

What is worth mentioning is that due to the imbalance nature of the data-set where neg-
ative value is much more than the positive value, the accuracy score might not provide
a good insight on how well the model is in fact able to predict the positive value. The
upcoming analysis of scores Precision, Recall and F1 shall shed more light on this re-
gard.

4.2.2 Precision, Recall, F1

Precision, Recall and F1 scores are important measurements when the positive class is
the focus than the negative. Precision refers to the percentage of the correctly predicted
positive class (TP) over the number of total positively predicted value (TP+FP), and Recall
is defined as the ratio of correct positive predictions(TP) to the total positives examples
(TP+FN). Precision is a good indicator when minimizing the false positive is the focus
while Recall is wise to check when avoiding the false negative is more crucial. Regarding
the business case of this study Recall is more relevant as the goal is to predict as much
as possible upcoming food waste - the positive class. F1 score is the harmonic mean of
Precision and Recall as it takes both scores into account, therefore F1 score is a more
reliable indicator than Accuracy when dealing with the unbalanced data-set. The value of
all three scores range from 0 to 1 , and the best score can be 1 in case of no false positive
or false negative values predicted. On the contrary, if scores are calculated to be or near
zero it means the number of correctly predicted positive value is rather limited.

We will first check the overall value distribution of the three scores in the box plot shown
in Figure 22.

The three charts in Figure 22 are respectively for Precision, Recall and F1 score, and each
chart contains five box plots for the five studied models. Again color code is used here to
tell the classification threshold, also the same X axis is shared by three charts to facilitate
the score comparison. In general, the most products’ three scores are between 0 and 0.4.

38
Figure 22. Precision, Recall and F1 Distribution

39
Figure 23. Precision, Recall and F1 Key Info

It can be also observed that smaller threshold has lead to smaller Precision score, but
bigger Recall score. F1 is determined by both Precision and Recall, and the best F1 score
is achieved by LSTM model with window size 7, in case of threshold 0.8. Now let’s zoom
into the performance of each model. In the earlier result review, GRU model has got best
performance in terms of RSME and Accuracy score, however it behaves the opposite way
in the Presicion, Recall and F1 score, with lowest median and IQR among all models. The
rest four models on the other hand have had similar behavior in Recall score distribution,
while LSTM models have got higher median Precision scores than others.

The key statistic figures of the three scores grouped by model and rate are listed in Figure
23, which includes the count, mean, standard deviation, minimum, quartile and max value
of the specific score per each model. There are 45 products being evaluated in the confu-
sion matrix. However the counts of all scores are under 45 because Precision score is not
available in case TP+FP is zero for certain products and similarly Recall score cannot be
calculated if TP+FN is zero. F1 score depends on both Precision and Recall therefore it
has value in case the other two scores are relevant.

When browsing further over the mean and quartile values in the table, it can be observed
that the scores vary from model to model but the overall absolute mean and median value
are pretty low. The mean Precision and Recall scores of the five models are around 0.2,
while F1 is slightly lower and between 0.08 and 0.17. In addition, the minimum value of
the three scores is 0 for all models, which indicates that deep learning models in study do
not work well on at least some products to predict correctly any waste. If we compare the

40
scores cross the models, GRU has got lowest mean and median value in all three scores,
but it also achieved the highest maximum Recall score 1 which none of the others can
do. LSTM models on the other hand have got better scores in the evaluation of Precision,
Recall and F1 scores.

The analysis of Precision, Recall and F1 has reached a complete different conclusion than
the previous analysis of RMSE and Accuracy. The RMSE and Accuracy score in general
are in decent level over for almost all the models, where GRU model has ranked the best
among the studies models. However according to the Precision, Recall and F1 scores
which are more suitable for describing the imbalanced data-set, the quality of predictions
made by the deep learning models are not very satisfactory as the majority of the products
got these three scores close or near to 0.2.

41
5 CONCLUSIONS

5.1 SUMMARY
In this thesis we have experimented four deep learning models MLP, CNN, LSTM and
GRU on a time series data-set, aiming to find out how well the deep learning models are
able to predict the potential food waste in the near future. The data-set consists of histor-
ical data of the Ready-to-Eat products from a grocery store in a Finland, across the time
period from Jan 2019 to Sep 2021. After data cleaning, 45 products with enough waste
record are retained for the training and testing. The performance of the experiments have
been evaluated by means of RMSE value and a customized confusion matrix. Most prod-
ucts have got RMSE value between 0 and 1 for all models, where model GRU and MLP
have achieved the smaller mean and median RMSE compared the others. Regarding con-
fusion matrix related score measurements, the majority of the products have got Accuracy
score over 0.8 and the Median value of Precision, Recall and F1 scores are near 0.2. In
addition LSTM models have been observed slightly better than the peers concerning the
confusion matrix related performances. GRU on the other hand are lagging behind on this
regard.

5.2 Future work

Due to the limited time and data, the study has been limited to one store and one product
group. It might be worth to extend the research to more stores and product groups, so
as to have more comprehensive view on the capability of the deep learning models in
this business context. Moreover, traditional machine learning regression algorithms e.g.
XGBooster, Random Forest are also good to try on the same data-set to see the pros and
cons compare to deep learning models.

42
REFERENCES
Abirami, S. & Chitra, P. 2020, Chapter Fourteen - Energy-efficient edge based real-
time healthcare support system, In: Pethuru Raj & Preetha Evangeline, eds., The
Digital Twin Paradigm for Smarter Systems and Environments: The Industry Use
Cases, Advances in Computers, vol. 117, Elsevier, pp. 339–368. Available: https:
//www.sciencedirect.com/science/article/pii/S0065245819300506.

2021, anualreport. Available: https://www.kesko.fi/globalassets/03-sijoittaja/

raporttikeskus/2021/q1/kesko_annual_report_2020_sustainability.pdf.

Bahi, Meriem & Batouche, Mohamed. 2018, Deep Learning for Ligand-Based Virtual
Screening in Drug Discovery.

Blaji, Sai. 2020, DBinary Image classifier CNN using Ten-

sorFlow. Available: https://medium.com/techiepedia/
binary-image-classifier-cnn-using-tensorflow-a3f5d6746697.

Dinesh C.S. Bisht, Mangey Ram. 2021.

FAO. 2011, Global Food Losses and Food Waste. Extent, Causes and Prevention.

Felicitas Schneider, Mattias Eriksson. 2020, FOOD WASTE (AND LOSS) AT THE RE-
TAIL LEVEL.

2021, GROCERY TRADE. Available: https://www.kesko.fi/en/company/divisions/

grocery-trade/.

IBM. 2021, What is Machine learning. Available: https://www.ibm.com/cloud/learn/

machine-learning.

Lars Kegel, Martin Hahmann & Lehner, Wolfgang. 2018.

Lek, S. & Park, Y.S. 2008, Multilayer Perceptron, In: Sven Erik Jørgensen & Brian D.
Fath, eds., Encyclopedia of Ecology, Oxford: Academic Press, pp. 2455–2462. Avail-
able: https://www.sciencedirect.com/science/article/pii/B9780080454054001622.

43
Phi, Michael. 2018, Illustrated Guide to LSTM’s and GRU’s: A
step by step explanation. Available: https://towardsdatascience.com/
illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21.

Rivas, Dr. Pablo. 2020, Deep Learning for Beginners, Packt Publishing.

S. Hochreiter, J. Schmidhuber. 1997.

Samuel, A. L. 2000, Some studies in machine learning using the game of checkers, IBM
Journal of Research and Development, vol. 44, no. 1.2, pp. 206–226.

Selvin, Sreelekshmy; Vinayakumar, R; Gopalakrishnan, E. A; Menon, Vijay Krishna &

Soman, K. P. 2017, Stock price prediction using LSTM, RNN and CNN-sliding win-
dow model, In: 2017 International Conference on Advances in Computing, Communi-
cations and Informatics (ICACCI), pp. 1643–1647.

Stenmarck, Jensen C. Quested T. Moates G., Å. 2016, Estimates of European food waste
levels.

UNEP. 2021, UNEP Food Waste Index Report 2021.

PPB3133 Course Info & Coursework-2
No ratings yet
PPB3133 Course Info & Coursework-2
35 pages
17.Master2017Liu
No ratings yet
17.Master2017Liu
105 pages
Miguel Angel Abad Arranz
No ratings yet
Miguel Angel Abad Arranz
172 pages
Machine Learning for Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
No ratings yet
Machine Learning for Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
179 pages
MAJDANI SHABESTARI 2020 Automated Anomaly Recognition in Real Time
No ratings yet
MAJDANI SHABESTARI 2020 Automated Anomaly Recognition in Real Time
181 pages
Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
Thesis Khetan Harsha
No ratings yet
Thesis Khetan Harsha
82 pages
Om Scratch
100% (1)
Om Scratch
124 pages
Machine learning for materials science
No ratings yet
Machine learning for materials science
288 pages
Numsense
No ratings yet
Numsense
138 pages
Fardapaper-Image-generation-through-feature-extraction-and-learning-Using-a-deep-learning-approach
No ratings yet
Fardapaper-Image-generation-through-feature-extraction-and-learning-Using-a-deep-learning-approach
113 pages
RapportPFE_MohamedNourElhak_jouini
No ratings yet
RapportPFE_MohamedNourElhak_jouini
73 pages
BE_Project_Report
No ratings yet
BE_Project_Report
63 pages
AI and Machine Learning Report Sample 2
No ratings yet
AI and Machine Learning Report Sample 2
71 pages
Unsupervised_Time_Series_Outlier_Detection123
No ratings yet
Unsupervised_Time_Series_Outlier_Detection123
56 pages
Group 5 GENBIO Ppt.pptx
No ratings yet
Group 5 GENBIO Ppt.pptx
31 pages
Amanuel Negash
No ratings yet
Amanuel Negash
130 pages
Week_7.2_Fuzzy Logic System
No ratings yet
Week_7.2_Fuzzy Logic System
23 pages
IASD Master Thesis
No ratings yet
IASD Master Thesis
48 pages
Tesis Velasquez Rendon David
No ratings yet
Tesis Velasquez Rendon David
223 pages
Report-4
No ratings yet
Report-4
50 pages
CV VIVIAN
No ratings yet
CV VIVIAN
4 pages
1.6 Machine Learning For Time Series Analysis and Forecasting
No ratings yet
1.6 Machine Learning For Time Series Analysis and Forecasting
54 pages
Integrating Image and Text Features for Accurate House Price Estimation (2)
No ratings yet
Integrating Image and Text Features for Accurate House Price Estimation (2)
16 pages
Fabric Defect Final Black Book Abcdeffg
No ratings yet
Fabric Defect Final Black Book Abcdeffg
64 pages
Data science
No ratings yet
Data science
68 pages
Group 06 Report 1
No ratings yet
Group 06 Report 1
40 pages
Harsh (17)
No ratings yet
Harsh (17)
44 pages
Thesis Sample For CS RCET UET Copy
No ratings yet
Thesis Sample For CS RCET UET Copy
93 pages
Livro TinyML
No ratings yet
Livro TinyML
87 pages
AD619 Assignment1 Pambadi
No ratings yet
AD619 Assignment1 Pambadi
6 pages
22mdt1038 Capstone Final
No ratings yet
22mdt1038 Capstone Final
63 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Jetson Nano
100% (1)
Jetson Nano
349 pages
33358_Report
No ratings yet
33358_Report
31 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
DM JacquelineMarchetti 2022 MEI
No ratings yet
DM JacquelineMarchetti 2022 MEI
118 pages
Business Forecasting System 181103
No ratings yet
Business Forecasting System 181103
51 pages
Marketing Research: From Data To Information To Action
No ratings yet
Marketing Research: From Data To Information To Action
25 pages
Philo Activity Empathy
No ratings yet
Philo Activity Empathy
3 pages
Norberg-Schulz, Christian. _I. Place__. En Norberg-Schulz, Christian. Genius loci. Towards a phenomenology of architecture, 6-23. Nueva York, Rizzoli, 1980
No ratings yet
Norberg-Schulz, Christian. _I. Place__. En Norberg-Schulz, Christian. Genius loci. Towards a phenomenology of architecture, 6-23. Nueva York, Rizzoli, 1980
11 pages
IOE_Thapathali_Campus_Minor_and_Major_Project_Report_Template__5_-7
No ratings yet
IOE_Thapathali_Campus_Minor_and_Major_Project_Report_Template__5_-7
19 pages
BUSINESS FORECASTING SYSTEM 181103 Update 29 12 22
No ratings yet
BUSINESS FORECASTING SYSTEM 181103 Update 29 12 22
52 pages
Circle Diagram of Induction Motor
No ratings yet
Circle Diagram of Induction Motor
3 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
Thesis 2022-Bayesian Convolutional Neural Network With Prediction Smoothing A
No ratings yet
Thesis 2022-Bayesian Convolutional Neural Network With Prediction Smoothing A
65 pages
Project Report Template PICT 1
No ratings yet
Project Report Template PICT 1
36 pages
Rastoghi2018leonardo PDF
No ratings yet
Rastoghi2018leonardo PDF
92 pages
EC-4110-I EC-4110-ICON Intelligent: Ġ Ġ Ġ Conductivity
No ratings yet
EC-4110-I EC-4110-ICON Intelligent: Ġ Ġ Ġ Conductivity
88 pages
Bhagya Report Final
No ratings yet
Bhagya Report Final
73 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
Citrus Plant Disease Classifcation and Recommendation System Using Deep Learning
No ratings yet
Citrus Plant Disease Classifcation and Recommendation System Using Deep Learning
53 pages
Data Science Part 1
No ratings yet
Data Science Part 1
5 pages
India's Glorious Scientific Tradition - Suresh Soni
No ratings yet
India's Glorious Scientific Tradition - Suresh Soni
358 pages
Name and Surname: Shan Botan Tahsin Tahsin Student ID No: 210320125 Analytical Chemistry 1 - Laboratory Report
No ratings yet
Name and Surname: Shan Botan Tahsin Tahsin Student ID No: 210320125 Analytical Chemistry 1 - Laboratory Report
4 pages
Understanding Deep Learning Chitta Ranjan
No ratings yet
Understanding Deep Learning Chitta Ranjan
13 pages
Practice Exam Questions
No ratings yet
Practice Exam Questions
3 pages
PMIT THESIS PROJECT Template
No ratings yet
PMIT THESIS PROJECT Template
34 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Human Eye & The Colorful World - Practice Sheet - Class 10th One-Shot Series
No ratings yet
Human Eye & The Colorful World - Practice Sheet - Class 10th One-Shot Series
4 pages
Screenshot 2023-10-23 at 7.15.14 AM
No ratings yet
Screenshot 2023-10-23 at 7.15.14 AM
25 pages
Hmls
No ratings yet
Hmls
126 pages
Shreya Ghosh MS Thesis Final Revised
No ratings yet
Shreya Ghosh MS Thesis Final Revised
64 pages
Untitled Document 6
No ratings yet
Untitled Document 6
7 pages
Final 1
No ratings yet
Final 1
6 pages
Comprehensive Disaster Risk Reduction and Management in Basic Education Framework
No ratings yet
Comprehensive Disaster Risk Reduction and Management in Basic Education Framework
39 pages
Thesis Fan Zhang
No ratings yet
Thesis Fan Zhang
40 pages
Chapter 4 Global Governance
No ratings yet
Chapter 4 Global Governance
11 pages
Python Data Science
100% (1)
Python Data Science
173 pages
Industrial Applications of Machine Learning PDF
100% (5)
Industrial Applications of Machine Learning PDF
349 pages
Balaji Virtual Case Study Sheet
No ratings yet
Balaji Virtual Case Study Sheet
1 page
Project Proposal 260 Copy
No ratings yet
Project Proposal 260 Copy
38 pages
Thesis
No ratings yet
Thesis
45 pages
726.astm B280-20
No ratings yet
726.astm B280-20
9 pages
Dissertation - FEA Code For CFD in MATLAB
No ratings yet
Dissertation - FEA Code For CFD in MATLAB
15 pages
اسئلة تحليلات الدوائر الكهربائية (شبكات كهربائية)
100% (1)
اسئلة تحليلات الدوائر الكهربائية (شبكات كهربائية)
3 pages
Machine Learning General Concepts
100% (4)
Machine Learning General Concepts
80 pages
ELX - Q1 Activity 1 - DIY Battery
No ratings yet
ELX - Q1 Activity 1 - DIY Battery
6 pages
(Template) Rubric For Demonstration Teaching
No ratings yet
(Template) Rubric For Demonstration Teaching
4 pages
POLY-G® 71-357: Safety Data Sheet
No ratings yet
POLY-G® 71-357: Safety Data Sheet
6 pages
8 Lateral Earth Pressures
No ratings yet
8 Lateral Earth Pressures
16 pages
Grade 9 - 2nd Quarter - Week 8 Day 1 - 2
No ratings yet
Grade 9 - 2nd Quarter - Week 8 Day 1 - 2
14 pages
Ee210-Project Report Pdf-Ilovepdf-Compressed
No ratings yet
Ee210-Project Report Pdf-Ilovepdf-Compressed
59 pages
Leap 2025 Grade 3 Ela PBT Practice Test Answer Key
No ratings yet
Leap 2025 Grade 3 Ela PBT Practice Test Answer Key
7 pages
Unit 2test A
No ratings yet
Unit 2test A
3 pages
Iso 7899 1 en PDF
100% (1)
Iso 7899 1 en PDF
8 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Food Waste ML Thesis

Uploaded by

Food Waste ML Thesis

Uploaded by

Food Waste Prediction in Grocery Stores

Time Series Forecasting by Deep Learning

Degree Programme: Master of Engineering - Big Data Analytics

Identification number: 8635

Keywords: foodwaste, timeseries forecasting, deep learning

2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1.1 Food Waste at Retail Industry

1.1.2 Food Waste in Case Company

1.3 Research Question

1.5 Ethical considerations

2.1 Machine Learning

2.2 Time Series Forecasting

• Non Linear models

2.3 Traditional statistical models

2.4 Deep learning models

• Multilayer perceptrons (MLP)

• Convolutional neural network (CNN)

• Recurrent neural network (RNN)

2.4.1 Multilayer perceptrons MLP

2.4.2 Convolutional neural network CNN

2.4.3 Recurrent neural network RNN

Figure 4. Research Methodology

• Store Type: hypermarket

• Product Group: ready-to-eat food

• Time Period: Jan 2019 - Sep 2021

3.1.1 Data Collection

3.1.2 Data Exploring

3.1.3 Data Pre-processing

Data Normalization The purpose of data normalization is to increase the accuracy of

3.2.1 Development environment

• Python programming environment 3.7.12

• Numpy, Pandas data science libraries for Python

• Scikit-learn machine learning library for Python

• Keras deep learning API for Python

• Matplotlib, plotly libraries for Data Visualization

• SAP Business Warehouse for data collection

• SAP Forecast and Replenishment for data collection

All machine learning experiments has been performed in Google Colaboratory.

Figure 12. Model - Convolutional Neural Network

3.3.2 Customized confusion matrix

Table 2. Confusion Matrix Definition

Confusion Matrix Definition

Recall is more appropriate when minimizing false negatives is the focus.

2 ∗ Precision ∗ Recall 2∗TP

• 5 Deep Learning Models

Figure 16. Experiment Result

Figure 18. RMSE Statistics

4.2 Customized Confusion Matrix

4.2.2 Precision, Recall, F1

5.2 Future work

2021, anualreport. Available: https://www.kesko.fi/globalassets/03-sijoittaja/

Blaji, Sai. 2020, DBinary Image classifier CNN using Ten-

Dinesh C.S. Bisht, Mangey Ram. 2021.

2021, GROCERY TRADE. Available: https://www.kesko.fi/en/company/divisions/

IBM. 2021, What is Machine learning. Available: https://www.ibm.com/cloud/learn/

Lars Kegel, Martin Hahmann & Lehner, Wolfgang. 2018.

S. Hochreiter, J. Schmidhuber. 1997.

Selvin, Sreelekshmy; Vinayakumar, R; Gopalakrishnan, E. A; Menon, Vijay Krishna &

UNEP. 2021, UNEP Food Waste Index Report 2021.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.