Ifmpt2024 343 349
Ifmpt2024 343 349
Volume 88 (2024)
1. Introduction
The necessity for accurate sales prediction has grown to be an essential requirement in the
successful management of most businesses [1]. Sales prediction is a process of predicting future sales
for products in various fields, such as fashion products [2], E-Commerce markets [3], business firms
[4], retail company [5] and even individual products in supermarkets [6]. Sales prediction is essential
for business success, enabling businesses to plan based on forecasts proactively. However, it is
undeniable that predictions inherently carry some degree of bias or, in other words, a margin of error.
Nonetheless, the choice of different models for sales forecasting can yield varying levels of accuracy,
ranging from low to high. It is advisable to integrate additional tasks into the current model to enhance
the accuracy of prediction models [7].
Next, this study will present some models employed by earlier researchers and discuss their
corresponding outcomes. Firstly, in the realm of E-Commerce market sales prediction, a newly
developed model known as the Seq2Seq and Transformer architecture using deep learning has
emerged to tackle the sales forecasting challenge for Corporación Favorita. This model not only
delivers the highest performance but also does so at the most cost-effective theoretical computational
expense. These findings indicate that, for this specific use case, the Seq2Seq trimmed model stands
out as the recommended choice, owing to its remarkable efficiency and effectiveness [8].
In the realm of E-commerce, it is worth noting that deep learning models are not the exclusive
choice. Machine learning, another widely adopted and popular approach, exhibits extensive
versatility, enabling its application not only in the E-commerce field but also in real-world
supermarket commerce [3, 6]. In the field of E-commerce, three other commonly used models,
including Incentive-Auto-Regressive-Integrated-Moving-Average (I-ARIMA), Long-Short-Term
Memory (LSTM), and Artificial-Neural-Network (ANN), are introduced. While the I-ARIMA model
falls under deep learning, LSTM and ANN are considered machine learning techniques. These three
approaches can accommodate varying accuracy requirements and different data types. Examining
diverse datasets from various E-Commerce domains indicates that LSTM excels over others,
including contemporary machine learning options, in terms of predictive precision. Upon comparing
343
Highlights in Science, Engineering and Technology IFMPT 2024
Volume 88 (2024)
these models, researchers have observed that machine learning models perform more accurately than
deep learning models. The lower accuracy is because conventional time-series analysis-based
methods primarily rely on business records, overlooking the spatial relationships between adjacent
retail locations [3]. In real-world supermarket business, four machine learning models are introduced:
XGBoost, ARIMAX, LSTM, and Facebook Prophet. Generally, XGBoost and LSTM exhibited
superior performance with lower errors. However, Facebook Prophet excelled in accuracy during
holidays, making it ideal for holiday forecasts. LSTM adapted swiftly to holiday dynamics, enhancing
performance.
Weather data did not significantly improve models and sometimes worsened results. Thus, results
are inconclusive, highlighting the model choice's dependence on time and forecasting goals [6].
Undoubtedly, as shown by the earlier examples, it is clear that machine learning models are more
suitable for forecasting sales. Some data could interfere with our model's performance when using a
dataset for predictions. In order to optimize our predictions, it is crucial to eliminate these disruptive
factors. After conducting several research, it was found that Lasso regression can serve this purpose
for optimization. Therefore, this paper applies the least absolute shrinkage and selection operator
regression model, also called as the LASSO regression model. The reason is that in predictive
research, the Lasso technique enhances predictions by shrinking regression coefficients and simplifies
models by setting some coefficients to zero [9]. By using LASSO, it would be a good fit to deal with
datasets where number of unknown parameters is relatively larger than observations [10]. The paper
is structured into five sections: Section 2 will introduce the data and methodology, section 3 will
discuss the results and provide analysis along with graphs, section 4 will explore limitations and
future outlooks, and section 5 will conclude the paper.
over the model's complexity. For our model, one has set the alpha parameter to 0.001. In addition to
the alpha parameter, Lasso regression inherits various other parameters from the base
'linear_model.Lasso' class in scikit-learn. In our current configuration, these parameters retain their
default settings, including 'fit_intercept' and 'normalize'. To assess the model's performance and
determine if Lasso regression is an appropriate choice, one employs three evaluation methods: R-
squared (R2), Root Mean Squared Error (RMSE), and Cross-Validation (CV). These metrics
collectively provide insights into the model's goodness of fit, predictive accuracy, and generalization
capability. In basic terms, as R2 approaches 1, it signals a stronger model fit (typically, R2>0.5 is
regarded as acceptable). As for RMSE, it quantifies the average prediction error, with lower RMSE
values signifying a more precise fit. To implement cross-validation effectively, such as using k-fold
cross-validation, assess the model's performance across diverse data subsets. This approach promotes
both consistency and acts as a safeguard against overfitting.
345
Highlights in Science, Engineering and Technology IFMPT 2024
Volume 88 (2024)
outliers within this subset of data. Outliers can have a significant impact on data accuracy and should
be addressed as part of the analysis process. After identifying variables with outliers, our aim is to
optimize and minimize the presence of these outliers. The approach employed involves calculating
the first quartile (Q1) corresponding to the 5th percentile and the third quartile (Q3) corresponding to
the 95th percentile. Subsequently, one computes the Interquartile Range (IQR) using the formula IQR
= Q3 - Q1. The dataset is then filtered to retain only the rows where the respective data points fall
within the range [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR]. This step effectively eliminates outliers from the
target feature, enhancing data integrity and analysis accuracy. Following the optimization of outliers,
One can proceed to examine the correlations between variables. The visualizations presented below
provide a more intuitive way to understand the relationships between these variables and the target
variable. These Fig. 1 representations offer valuable insights into the associations between the
variables and the target variable.
A heatmap serves as an effective tool for visually illustrating the strength of correlations between
pairs of variables as well. When the correlation coefficient approaches 1, it signifies a robust and
positive relationship between the variables. Conversely, when the correlation coefficient nears 0, it
signifies a lack of discernible influence or a weak association between the variables. Seen from Fig.
2, one can easily identify the features that exhibit lower correlations with the target variable
(item_outlet_sales).
346
Highlights in Science, Engineering and Technology IFMPT 2024
Volume 88 (2024)
separated from the categorical values. When considering the year of establishment, one calculates the
number of years the outlet has been in operation by subtracting the provided year from the current
year (2023). This transformation provides a more continuous representation of the data, enhancing
our comprehension of it. Additionally, one converts all other variables into an 'object group' by
applying the ‘.astype('object')’ method to the respective strings. Regarding variables that have only
two distinct categorical responses, one can efficiently map these variables by assigning '0' or '1' to the
respective values. This mapping process allows us to represent these values as numerical values. In
the case of the remaining categorical variables, one employs a technique known as the creation of
dummy variables to convert them into numerical values. Dummy variables, also referred to as
indicators or binary variables, play a crucial role in statistical modeling and data analysis, particularly
in regression analysis and machine learning. These variables are utilized to represent categorical data,
facilitating the transformation of such variables into a numeric format that is compatible with various
algorithms and models.
After converting all values to numerical values, one now could apply the lasso regression model
to the new dataset. This study first splits the original dataset into train and test dataset. The training
dataset teaches and refines the machine learning model, while the test dataset checks how well the
model predicts new, unseen data. This split helps avoid the model from getting too specialized, where
it's great with the training data but falters with new information, ensuring it can provide useful
predictions in real-life situations. This study randomly assigned 80% of the data by the train dataset
and the rest 20% be the test dataset. Then applying GridSearchCV to modify model with optimal
value of alpha. After applying the lasso progression, this study plots and gets the comparison graph
for the train and test data. Fig. 3 illustrates a good fit for both the train and test data. The solid black
line represents the train data, while the blue dashed line represents the test data.
347
Highlights in Science, Engineering and Technology IFMPT 2024
Volume 88 (2024)
to R-squared and RMSE, this study has employed another essential evaluation technique to test the
performance of the model called Cross-Validation (CV). Cross-Validation is a method used to
evaluate the performance of the predictive model. The method partitions the dataset into different
sections, typically a training set and a validation set, and then repeatedly training and evaluating the
model. The goal is to attain a more precise assessment of the model's performance, particularly its
ability to make accurate predictions on new, unseen data. Our results reveal consistently robust R-
squared values across all folds, ranging from 0.8649 to 0.9286, with a mean R-squared value of
0.9001. These findings affirm the model's high accuracy and suitability for the task at hand.
5. Conclusion
To sum up, LASSO steps up as a solid choice for predicting sales. Our analysis revealed that it
adeptly sifted through the multitude of variables, 81 in total, to pinpoint the most influential features.
This ability to identify key predictors empowers us to allocate resources and efforts more efficiently.
In real estate data, it's normal to find features that are highly correlated such number of bedroom and
348
Highlights in Science, Engineering and Technology IFMPT 2024
Volume 88 (2024)
kitchen above ground, number of fireplaces in our dataset. Lasso handles this tricky stuff with finesse,
making sure researchers don't rely too much on linked things and making the model tougher and
easier to understand. Predicting property prices is crucial for both the real estate industry and
homebuyers. Lasso-based models provide data-driven guidance for property deals, helping buyers
and sellers make smarter pricing and negotiation decisions. It's worth noting that while Lasso excels
in many scenarios, it may not be the optimal choice when dealing with datasets characterized by
nonlinear relationships among variables. However, this serves as an opportunity to compare different
modeling approaches to select the one that best fits the dataset's unique characteristics. In summary,
leveraging Lasso for housing price prediction not only enhances forecasting accuracy but also
provides valuable insights into the pivotal factors influencing property prices. This approach signifies
the application of data science and machine learning to the realm of real estate, making them more
data-driven and intelligent.
References
[1] Rothe J T. Effectiveness of sales forecasting methods. Industrial Marketing Management, 1978, 7(2): 114-
118.
[2] Chen D, Liang W, Zhou K, et al. Sales Forecasting for Fashion Products Considering Lost Sales. Applied
Sciences, 2022, 12(14): 7081.
[3] Ajaykrishna S, Suganya T S, Rao B, et al. Online Sales Prediction in E-Commerce Market Using Machine
Learning. 2023 4th International Conference on Signal Processing and Communication (ICSPC). IEEE,
2023: 47-51.
[4] Wang C H, Gu Y W. Sales Forecasting, Market Analysis, and Performance Assessment for US Retail
Firms: A Business Analytics Perspective. Applied Sciences, 2022, 12(17): 8480.
[5] Saha P, Gudheniya N, Mitra R, et al. Demand forecasting of a multinational retail company using deep
learning frameworks. IFAC-PapersOnLine, 2022, 55(10): 395-399.
[6] Fredén D, Larsson H. Forecasting daily supermarkets sales with machine learning. CCSI, 2020.
[7] Dalrymple D J. Sales forecasting methods and accuracy. Business Horizons, 1975, 18(6): 69-73.
[8] Vallés-Pérez I, Soria-Olivas E, Martínez-Sober M, et al. Approaching sales forecasting using recurrent
neural networks and transformers. Expert Systems with Applications, 2022, 201: 116993.
[9] Musoro J Z, Zwinderman A H, Puhan M A, et al. Validation of prediction models based on lasso
regression with multiply imputed data. BMC medical research methodology, 2014, 14(1): 1-13.
[10] Gauraha N. Introduction to the lasso: A convex optimization approach for high-dimensional problems.
Resonance, 2018, 23(4): 439-464.
349