Time Series Forecasting - Rose - Buisness Report
Time Series Forecasting - Rose - Buisness Report
1
Time Series Forecasting-Rose
Content:
Problem Statement………………………………………………………………………………………………………………………04
1. Read the data as an appropriate Time Series data and plot the data…………………………………………04
2. Perform appropriate Exploratory Data Analysis to understand the data and perform
decomposition………………………………………………………………………………………………………………………….06
3. Split the data into training and test. The test data should start in 1991…………………………………….14
4. Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using RMSE………………………………………………………………..15
5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and mention the hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the new data for stationarity and
comment…………………………………………………………………………………………………………………………………..33
Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected
using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model
on the test data using RMSE……………………………………………………………………………………………………35
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data
and evaluate this model on the test data using RMSE……………………………………………………………….54
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data…………………………………………………………………………………..64
9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands……………….65
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales………………………………………………………………………………..68
2
Time Series Forecasting-Rose
Figure Table:
1. Rose Year wise data plot ……………………………………………………………………………………………………..…07
2. Month wise data plot………………………………………………………………………………………………………………08
3. Monthly Sales across Year..……………………………………………………………………………………………………..09
4. Time Series Plot ….……………………………………………………………………………………………………………………09
5. Empirical Cumulative Distribution…………………………………………………………………………………………..10
6. Average Rose, and precent change...……………………………………………………………………………………….11
7. Multiplicative decomposition ..…………………………………………………………………………………………………12
8. Additive decomposition . …………………………………………………………………………………………………………17
9. Linear Regression Model. …………………………………………………………………………………………………………18
10. Linear Regression Model. ………………………………………………………………………………………………………..20
11. Naïve Forecast. ………………………………………………………………………………………………………………………..21
12. Simple average forecast. …………………………………………………………………………………………………………24
13. Moving Average ………………………………………………………………………………………………………………………29
14. Plotting on the whole data. ………………………………………………………………………………………………………31
15. Plotting on both the Training and Test data………………………………………………………………..…………….33
16. Simple Exponential Smoothing……..…………………………………………………………………………………………..33
17. Triple Exponential Smoothing (Holt - Winter's Model)………………………………………………………………33
18. Rose TES forecast ..…………………………………………………………………………………………………………………..33
19. Rolling mean and Standard deviation…….…………………………………………………………………………………34
20. Automated ARIMA. ………………………………………………………………………………………………………..………..37
21. Automated SARIMA model……………………………………………………………………………………………………….38
22. Log Data Autocorrelation(acf)…….…………………………………………………………………………………………….45
23. Log Data Difference Autocorrelation(acf). ……………………………………………………………………………….45
24. Log Data Autocorrelation(pacf)………………………………………………………………………………………………..46
25. Log Data Difference Autocorrelation(pacf)……………………………………………………………………………..46
26. Manual ARIMA. ………………………………………………………………………………………………………………………..55
27. Manual SARIMA . …………………………………………………………………………………………………………………….63
28. The forecast along with the confidence band………………………………………………………………………….67
3
Time Series Forecasting-Rose
Problem Statement:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century
1.Read the data as an appropriate Time Series data and plot the data.
Solution:
Solution:
4
Time Series Forecasting-Rose
We seen above plot , there is an decreasing trend in the initial years which stabilizes after few
years and again shows a decreasing trend
We also observe seasonality in the data trend and patterns seem to repeat on yearly basis.
5
Time Series Forecasting-Rose
6
Time Series Forecasting-Rose
Yearly Boxplot
7
Time Series Forecasting-Rose
As observed in the time series plot , the year wise box plots over here also indicate a measure
of downward trend.
Also, we see that the sales of Rose wine has some outliers for certain years.
December seems to have the highest sales of the Rose wine and there are also outlier in June,
July, august, September months.
8
Time Series Forecasting-Rose
We observe from the above line plots of year and month wise sales data of Rose wine that
December month has the highest sales and January , February, and March month shows lower
sales values.
9
Time Series Forecasting-Rose
10
Time Series Forecasting-Rose
Additive-
11
Time Series Forecasting-Rose
For additive We see the residual values are around 0 and for multiplicative model we see the
residuals are around 1
12
Time Series Forecasting-Rose
13
Time Series Forecasting-Rose
3. Split the data into training and test. The test data should start in 1991.
Solution:
(132, 1)
(55, 1)
14
Time Series Forecasting-Rose
The train data of Rose wine sales has been split for data up to 1990 and 132 data points.
The test data of Rose wine sales has been split for data from 1991 and has 55 points.
From our train test split we are predicting the future sales as compared to the past years.
4. Build all the exponential smoothing models on the training data and evaluate
the model using RMSE on the test data. Other models such as regression, naïve
forecast models and simple average models. should also be built on the training
data and check the performance on the test data using RMSE.
Solution:
15
Time Series Forecasting-Rose
16
Time Series Forecasting-Rose
Evaluate this model on the test data using Root Mean Squared Error (RMSE)
For Regression On Time forecast on the Test Data, RMSE is 15.269
NaiveModel_train
17
Time Series Forecasting-Rose
Naïve Model_test
Naïve Forecast-
From the above plot , we observe that the red line in the chart which shows the naïve forecast
plotting is a straight line given the naïve models approach where Sales for tomorrow is the same
as today and it applies to all future periods.
Model Evaluation
For Naive forecast on the Test Data, RMSE is 79.719
18
Time Series Forecasting-Rose
19
Time Series Forecasting-Rose
As per above plot, we observe that the red line in the chart shows the Simple Average Forecast plotting
is a straight line given the Simple Average models approach where we use the averages Sales value to
forecast future Sales.
Model Evaluation
Moving Average model for Rose sale computed the moving averages for 2,4,6,9 point intervals.
20
Time Series Forecasting-Rose
21
Time Series Forecasting-Rose
For 2 point Moving Average Model forecast on the Training Data, RMSE is 11.529
For 4 point Moving Average Model forecast on the Training Data, RMSE is 14.451
For 6 point Moving Average Model forecast on the Training Data, RMSE is 14.566
For 9 point Moving Average Model forecast on the Training Data, RMSE is 14.728
Before we go on to build the various Exponential Smoothing models, let us plot all the models and
compare the Time Series plots.
22
Time Series Forecasting-Rose
23
Time Series Forecasting-Rose
For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is 36.796
Model Evaluation
24
Time Series Forecasting-Rose
25
Time Series Forecasting-Rose
Two parameters 𝛼 and 𝛽 are estimated in this model. Level and Trend are accounted for in this model.
Model Evaluation for Alpha = 0.68 and Beta = 0.0 : DES-Autofit Model:
For Alpha =0.68 Double Exponential Smoothing Model forecast on the Test Data, RMSE is 15.707
26
Time Series Forecasting-Rose
27
Time Series Forecasting-Rose
The above fit of the model is by the best parameters that Python thinks for the model. It uses a brute
force method to choose the parameters.
28
Time Series Forecasting-Rose
Model Evaluation for alpha = 0.11 and beta = 0.7 gama= 0.395 : TES-Autofit Model:
For Auto-fit Triple Exponential Smoothing Model forecast on the Test Data, RMSE is 20.157
29
Time Series Forecasting-Rose
30
Time Series Forecasting-Rose
31
Time Series Forecasting-Rose
32
Time Series Forecasting-Rose
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps
to make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.
Solution:
We check the Stationarity of the Rose Sales data at alpha 0.05 and observe from the following
result table that p value is greater than alpha value.
Hence we fail reject the null hypothesis that the data is not stationary.
33
Time Series Forecasting-Rose
34
Time Series Forecasting-Rose
Solution:
35
Time Series Forecasting-Rose
36
Time Series Forecasting-Rose
Predict on the Test Set using this model and evaluate the model.
RMSE: 37.30647971852104
MAPE: 76.93545693305195
37
Time Series Forecasting-Rose
Build an Automated version of a SARIMA model for which the best parameters are selected in
accordance with the lowest Akaike Information Criteria (AIC).
Automated SARIMA-
38
Time Series Forecasting-Rose
We observe the ACF plot for Rose sales and observe seasonality at intervals 12, hence we run
the Automated SARIMA models at seasonality .
And sorted the AIC values output from lowest to highest .
We then proceed to build the SARIMA model with the lowest Akaike Information Criteria.
39
Time Series Forecasting-Rose
40
Time Series Forecasting-Rose
41
Time Series Forecasting-Rose
42
Time Series Forecasting-Rose
Predict on the Test Set using this model and evaluate the model.
43
Time Series Forecasting-Rose
44
Time Series Forecasting-Rose
45
Time Series Forecasting-Rose
46
Time Series Forecasting-Rose
47
Time Series Forecasting-Rose
48
Time Series Forecasting-Rose
49
Time Series Forecasting-Rose
50
Time Series Forecasting-Rose
51
Time Series Forecasting-Rose
Predict on the Test Set using this model and evaluate the model.
52
Time Series Forecasting-Rose
53
Time Series Forecasting-Rose
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.
Solution:
Manual ARIMA-
We built manual ARIMA model for Rose sales, based on the ACF and PACF plots.
Hence, we choose AR parameters p values 5, Moving average parameter q value 2 and d value 1 based
on the below plots.
54
Time Series Forecasting-Rose
55
Time Series Forecasting-Rose
The data has some seasonality so we should build a SARIMA model to get better accuracy.
56
Time Series Forecasting-Rose
We built manual SARIMA model for Rose sales, based on the ACF and PACF plots.
Hence, we choose AR parameters p values 5, Moving average parameter q value 2 and d value 1 based
on the below plots. We then derived the seasonal parameters based on the seasonal cutoffs .
57
Time Series Forecasting-Rose
58
Time Series Forecasting-Rose
Now we see that there is almost no trend present in the data. Seasonality is only present in the
data.
Check the stationarity of the above series before fitting the SARIMA model.
59
Time Series Forecasting-Rose
Checking the ACF and the PACF plots for the new modified Time Series.
60
Time Series Forecasting-Rose
61
Time Series Forecasting-Rose
Predict on the Test Set using this model and evaluate the model.
62
Time Series Forecasting-Rose
63
Time Series Forecasting-Rose
8. Build a table (create a data frame) with all the models built along with their
corresponding
Solution:
We have consolidated the test results from the various models built in the forecasting process of
the future Rose Wine sales and get the following Test RMSE scores sorted in order.
The lowest score is 9.88.
It was obtained from the Triple Exponential smoothing model which was run based on different
smoothing level, smoothing trend (Beta), and smoothing seasonality (Gamma) values ranging
from0.3 to 1.0.
64
Time Series Forecasting-Rose
We observed from the RMSE scores that Triple Exponential would work better for the Rose Sales data
where we had seasonality and trend.
65
Time Series Forecasting-Rose
66
Time Series Forecasting-Rose
Prediction plot-
The upper and lower confidence bands were calculated at 95% confidence intervals
67
Time Series Forecasting-Rose
10.Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
Solution:
Inference and Recommendation –
Rose sales shows a decrease in trend compared to the previous years
December Month shows the highest sales across the year while the value has come down
through the years from 1980-1994.
The sales of Rose wine are Seasonal and also has trend , hence the company cannot have the
same stock through the year .
The predictions would help here to plan the stock need basis the forecasted sales.
The company should use the prediction results and capitalize on the high demand season and
ensure to source and supply high demand
The company should use the prediction results to plan the low demand season to stock as per
the demand.
68
Time Series Forecasting-Rose
Thank You…!
69