0% found this document useful (0 votes)
55 views210 pages

OM Forecasting

The document discusses key concepts in forecasting including: 1. Forecasting involves making educated guesses about future demand or other factors based on past data and analysis. All forecasts will contain some level of error or uncertainty. 2. Forecasts are generally more accurate over shorter time horizons compared to longer time horizons, as there are fewer uncertainties to contend with. Forecasts are also typically more accurate when made for groups of items rather than individual items. 3. The forecasting process involves determining the purpose and time horizon of the forecast, analyzing relevant data, selecting and testing forecasting models, generating the forecast, and monitoring forecast errors to refine models over time.

Uploaded by

ms22a031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views210 pages

OM Forecasting

The document discusses key concepts in forecasting including: 1. Forecasting involves making educated guesses about future demand or other factors based on past data and analysis. All forecasts will contain some level of error or uncertainty. 2. Forecasts are generally more accurate over shorter time horizons compared to longer time horizons, as there are fewer uncertainties to contend with. Forecasts are also typically more accurate when made for groups of items rather than individual items. 3. The forecasting process involves determining the purpose and time horizon of the forecast, analyzing relevant data, selecting and testing forecasting models, generating the forecast, and monitoring forecast errors to refine models over time.

Uploaded by

ms22a031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 210

Forecasting

➢ Forecast:
➢ An educated guess for the future based on objective analysis to the
extent possible.
➢ Despite the presence of inaccuracies, some forecast is better
than no forecast.
➢ Analytical forecast is better than intuitive forecast.
➢ What can be forecast?
➢ Demand - Common
➢ Environmental - Social, Political, Economic
➢ Sales characteristics: RCTS
➢ Users:
➢ Long range – Facilities
➢ Medium range – Production Planning
➢ Short range – Product Forecast
Principles of Forecasting

➢ Forecasts are rarely perfect.


➢ Any prediction involves some uncertainty.
➢ Goal of forecasting is to keep the forecast error as low
as possible.
➢ Forecasts are more accurate for groups of items
than for individual items.
➢ Forecasting errors among items in a group usually
have a cancelling effect.
➢ Forecasts are more accurate for shorter time
horizons than longer ones.
➢ Short-range forecasts must contend with fewer
uncertainties than longer-range forecasts.
Steps in the Forecasting Process

➢ Determine the purpose of the forecast


➢ Establish a time horizon
➢ Evaluate and analyse appropriate data
➢ Select and test the forecasting model
➢ Generate the forecast
➢ Monitor the forecast errors
➢ Use the error feedback to further refine the model
3
1. Long term 1. Process Design
2. Medium term 2. Capacity Planning
3. Short term 3. Aggregate
Planning
4. Scheduling
Time 5. Inventory
Horizon Uses
Management

Forecasting

Qualitative Methods Time-series

1. Market Research Causal 1. Moving Average


2. Panel Consensus
2. Exponential
3. Delphi
Smoothing
4. Historical 1. Regression
3. Box-Jenkins
Analogy 2. Econometric
Approach
3. Input-output
4. Simulation
Selection of Forecasting Method

1. User and system sophistication

2. Time and Resources available

3. Use or decision characteristics

4. Data availability (and accuracy)

5. Data pattern (plot on graph and check)


Selection of Forecasting Method

➢ Best fit is not necessarily best predictor.

➢ Do in 2 steps
➢ First fit various models,

➢ Then choose one with least error in the


training set and predict for the test set.
Selection of Forecasting
Method
Sometimes, human beings do better than
quantitative forecasts !!!
Time Series Methods of Forecasting
➢Moving Average
➢ Simple Moving Average
➢ Weighted Moving Average
➢Exponential Smoothing
➢ Simple Exponential Smoothing
➢ Holt’s Method (Double Exponential
Smoothing)
➢ Holt-Winters’ Method (Triple Exponential
Smoothing)
Typical Components of a Time Series
Cyclic
Pattern

Seasonal
Pattern
D
e
m Linear
a Trend
n
d Level

Random

Time
Simple Moving Average
➢ An averaging period “𝑛” is selected.

➢ The forecast for the next period (𝑡 + 1) is the arithmetic mean of


the most recent actual demands over 𝑛. (Here, 𝑡 ≥ 𝑛).

𝑡
1
𝐹𝑡+1 = ෍ 𝑋𝑖
𝑛
𝑖=𝑡−𝑛+1

➢ It is called a “simple” average because each observation used to


compute the average is equally weighted.
Simple Moving Average
➢ It is called “moving” because, as demand in
period 𝑡 + 1 becomes available, the demand in
period 𝑡 + 1 − 𝑛 is no longer used to make the
forecast.

➢ By increasing 𝑛, the forecast becomes less responsive


to fluctuations in demand (low impulse response).

➢ By decreasing 𝑛, the forecast becomes more


responsive to fluctuations in demand (high impulse
response).
Understanding Simple Moving Average
Simple Moving Average
Month Demand 2 Month 3 Month 5 Month 10 Month

Jan 100
Feb 110
Mar 120 105
Apr 130 115 110.00
May 125 125 120.00
Jun 115 127.5 125.00 117
Jul 105 120 123.33 120
Aug 95 110 115.00 119
Sep 85 100 105.00 114
Oct 87.5 90 95.00 105
Nov 97.5 86.25 89.17 97.5 107.25
Dec 107.5 92.5 90.00 94 107
Jan 117.5 102.5 97.50 94.5 106.75
Feb 127.5 112.5 107.50 99 106.5
Mar 137.5 122.5 117.50 107.5 106.25
Apr 132.5 132.5 127.50 117.5 107.5
May 122.5 135 132.50 124.5 109.25
Jun 112.5 127.5 130.83 127.5 111
Jul 102.5 117.5 122.50 126.5 112.75
Understanding Simple Moving Average

160
140
Demand
120
100
2 Month MA
80
3 Month MA
60
5 Month MA
40
10 Month MA
20
0
Weighted Moving Average

➢ This is a variation on the simple


moving average.
➢ Weights used to compute the
average are not equal. 𝐹𝑡+1 =

➢ This allows more recent demand


data to have a greater effect on the
moving average and, therefore, the
forecast.
Weighted Moving Average

➢ The weights can add to 1 and, in general,


demands in more recent periods are given higher
weightage.

➢ The distribution of the weights determine impulse


response of the forecast.
So, in summary, the Moving Average method has the following
properties:

➢ It smoothens the curve.

➢ Only the last 𝑛 observations are considered to make the


forecast for the next period.

➢ It lags in trend.

➢ Forecast can be made only for next immediate period.


Exponential Smoothing

➢ The Exponential Smoothing (ES) technique


forecasts the next value using a weighted
average of all previous values, where the
weights decay exponentially from the most
recent to the oldest historical value.

➢ The forecast is the sum of the old forecast and


a portion of the forecast error.
Simple Exponential Smoothing Method

➢ The exponential smoothing equation is


𝐹𝑡+1 = 𝛼𝑦𝑡 + (1 − 𝛼)𝐹𝑡

➢ 𝑭𝒕+𝟏 = forecast for the next period 𝒕 + 𝟏


➢ 𝜶 = smoothing parameter
➢ 𝒚𝒕 = actual response in period t
➢ 𝑭𝒕 = forecast made for period t.

➢ The forecast 𝐹𝑡+1 is made by assigning a weight


of 𝛼 to the most recent observation 𝑦𝑡 and a
weight of (1 − 𝛼) to the most recent forecast 𝐹𝑡 .
𝐹𝑡+1 = 𝛼𝑦𝑡 + 1 − 𝛼 𝐹𝑡
𝐹𝑡+1 = 𝛼𝑦𝑡 + (1 − 𝛼)(𝛼𝑦𝑡−1 + (1 − 𝛼)𝐹𝑡−1 )
𝐹𝑡+1 = 𝛼𝑦𝑡 + 1 − 𝛼 𝛼𝑦𝑡−1 + 1 − 𝛼 2 𝐹𝑡−1

Similarly we keep on replacing forecast terms (𝑭) with observed terms (𝒚),
𝐹𝑡+1 = 𝛼𝑦𝑡 + 1 − 𝛼 𝛼𝑦𝑡−1 + (1 − 𝛼)2 𝛼𝑦𝑡−2 + ⋯ + 1 − 𝛼 𝑛 𝛼𝑦𝑡−𝑛 + (1 − 𝛼)𝑡−1 𝐹2

We can initialize 𝐹2 = 𝑦1 , i.e., the response in the 1𝑠𝑡 period is taken as the
forecast for the 2𝑛𝑑 period.

𝐹𝑡+1 = 𝛼𝑦𝑡 + 1 − 𝛼 𝛼𝑦𝑡−1 + (1 − 𝛼)2 𝛼𝑦𝑡−2 + ⋯ + 1 − 𝛼 𝑛 𝛼𝑦𝑡−𝑛 + (1 − 𝛼)𝑡−1 𝑦1


We can notice that, as more and more
previous demand points are considered,
the sum of coefficients approaches 1.
Example - 1
A hospital wishes to predict the monthly requirements
for drug and surgical dressing inventory. The actual
demand for one such item is shown in the below given
table. Develop a forecasting model based on
exponential smoothing and predict the demand in
month 33.

Month 24 25 26 27 28 29 30 31 32

Demand (X) 78 65 90 71 80 101 84 60 73


Solution
• To calculate the forecast for period 𝑡 + 1,
demand in period 𝑡 and forecast for period 𝑡 are
needed.

• Demand in the 1𝑠𝑡 period (month 24) is taken as


the forecast for the 2𝑛𝑑 period (month 25), i.e.,
𝐹2 = 𝑦1 = 78.

• Here, the calculations for 4 different values of


the smoothing parameter 𝛼 are shown.
Solution
Solution
• In practice, the value of 𝛼 is fixed in such a way
so as to minimize the given measure of forecast
error.

• Forecast for month 33, i.e., period 10 at 𝛼 = 0.2 is


calculated as follows:

𝐹10 = 𝛼𝑦9 + 1 − 𝛼 𝐹9

𝐹10 = (0.2)(73) + 1 − 0.2 (78)

𝐹10 = 𝟕𝟕 units
Use
➢ To set safety stocks or safety capacity
➢ To ensure a desired level of protection against stock out

➢ To monitor erratic demand observations or outliers


➢ Should be carefully evaluated
➢ Perhaps rejected from the data
➢ To determine when the forecasting method is no longer
tracking actual demand and needs to be reset
➢ Monitor validity of forecasts
Measures of Forecast Error

1. Cumulative sum of Forecast Errors (CFE):


𝑛 𝑛

𝐶𝐹𝐸 = ෍ 𝑒𝑡 = ෍(𝑦𝑡 − 𝐹𝑡 )
𝑡=1 𝑡=1

where, 𝑒𝑡 is the forecast error in period 𝑡,


𝑦𝑡 is the actual response (observation) in period 𝑡,
𝐹𝑡 is the forecast for period 𝑡, and
𝑛 is the number of time series observations
Measures of Forecast Error
2. Mean Percentage Error (MPE):
𝑛
1 𝑒𝑡
𝑀𝑃𝐸 = ෍
𝑛 𝑦𝑡
𝑡=1

3. Mean Absolute Deviation (MAD):


𝑛
1
𝑀𝐴𝐷 = ෍ 𝑒𝑡
𝑛
𝑡=1

4. Mean Absolute Percentage Error (MAPE):


𝑛
1 𝑒𝑡
𝑀𝐴𝑃𝐸 = ෍
𝑛 𝑦𝑡
𝑡=1
Measures of Forecast Error
5. Mean Squared Error (MSE):
𝑛
1 2
𝑀𝑆𝐸 = ෍ 𝑒𝑡
𝑛
𝑡=1

6. Root Mean Squared Error (RMSE):

𝑛
1
𝑅𝑀𝑆𝐸 = ෍ 𝑒𝑡2
𝑛
𝑡=1
Measures of Forecast Error
7. Tracking Signal (𝑇𝑆):

𝐶𝐹𝐸𝑛 σ𝑛𝑡=1 𝑒𝑡
𝑇𝑆𝑛 = =
𝑀𝐴𝐷𝑛 1 σ𝑛 𝑒
𝑛 𝑡=1 𝑡

• A tracking signal monitors forecasts and


warns when there are systematic
departures of the outcomes from the
forecasts.
Example

For the same exponential smoothing problem,


the different measures of forecast error have
been summarized as follows:
Forecast Errors Summary
α 0.1 0.2 0.3 0.4
CFE -2 -5 -7 -9
MPE -3.12% -3.65% -4.10% -4.42%
MAD 11.14 11.37 11.60 12.16
MAPE 14.65% 15.05% 15.41% 16.15%
MSE 176.093 191.974 207.323 222.305
RMSE 13.270 13.855 14.399 14.910
TS -0.2006 -0.4255 -0.6143 -0.7204
CFE - Cumulative sum of Forecast Errors
MPE - Mean Percentage Error
MAD - Mean Absolute Deviation
MAPE - Mean Absolute Percentage Error
MSE - Mean Squared Error
RMSE - Root Mean Squared Error
TS - Tracking Signal
Tracking Signal calculation after every forecast
Forecast
Month Abs. % Abs
Demand Error % Error MADt 𝑻𝑺𝒕
(𝒕) 𝑭𝒕 𝜶 = 𝟎. 𝟏
at Error Error

24 78
25 65 78 -13 13 -0.20 0.20 13.00 -1.00
26 90 77 13 13 0.15 0.15 13.15 0.02
27 71 78 -7 7 -0.10 0.10 11.11 -0.61
28 80 77 3 3 0.03 0.03 9.00 -0.45
29 101 78 23 23 0.23 0.23 11.88 1.63
30 84 80 4 4 0.05 0.05 10.58 2.21
31 60 80 -20 20 -0.34 0.34 11.97 0.26
32 73 78 -5 5 -0.07 0.07 11.14 -0.20
𝑀𝐴𝐷𝑡- Mean Absolute Deviation up to period 𝑡
𝑇𝑆𝑡 - Tracking Signal up to period 𝑡
Holt’s Method (Double Exponential
Smoothing)

➢ One of the drawbacks of simple exponential smoothing


is its limited proficiency in handling trend and
seasonality.

➢ In 1957, professor Charles C. Holt extended the basic


exponential smoothing method by introducing an
additional smoothing equation for trend.

➢ Thus, Holt’s method has two smoothing equations,


one for estimating the level of the series and one for
estimating the trend (hence the term ‘double’
exponential smoothing).
Holt’s Method (Double Exponential
Smoothing)
➢ The level and trend estimates for the current period are
then combined to forecast the response for the next
period.

➢ It is also possible to make a forecast ‘𝑚’ periods into the


future from the current level and trend estimates.

➢ The two smoothing parameters, 𝛼 (level) and 𝛽 (trend)


control how responsive the forecasts are to recent
observations.

➢ Setting 𝛽 and the initial trend estimate to zero reduces


Holt’s method to simple exponential smoothing.
Holt’s Method (Double Exponential
Smoothing)
➢ Three equations are used in the model:
➢ The level estimate:
𝐿𝑡 = 𝛼𝑦𝑡 + 1 − 𝛼 𝐿𝑡−1 + 𝑏𝑡−1
➢ The trend estimate:
𝑏𝑡 = 𝛽 𝐿𝑡 − 𝐿𝑡−1 + (1 − 𝛽)𝑏𝑡−1
➢ Forecast m periods into the future:
𝐹𝑡+𝑚 = 𝐿𝑡 + 𝑚𝑏𝑡
where,
➢ 𝐿𝑡 = estimate of the level of the series at time 𝑡
➢ 𝛼 = smoothing parameter for the level estimate
➢ 𝑦𝑡 = new observation or actual value of series in period 𝑡
➢ 𝛽 = smoothing parameter for trend estimate
➢ 𝑏𝑡 = estimate of the trend of the series at time 𝑡
➢ 𝑚 = no. of periods to be forecast into the future
Holt’s Method (Double Exponential
Smoothing)
• The smoothing parameters 𝛼 and 𝛽 can be selected
subjectively, or by minimizing a measure of forecast error.

• The initialization process for Holt’s method requires two


estimates, 𝐿1 and 𝑏1 .

• It is common practice to set 𝐿1 = 𝑦1 , while 𝑏1 may be


initialized in any one of the following ways:
𝑏1 = 𝑦2 − 𝑦1
(𝑜𝑟)
𝑦4 − 𝑦1
𝑏1 =
3
𝑜𝑟
𝑏1 = 0
Holt’s method: Example
Month, 𝒕 Sales, 𝒚𝒕
• Fit a suitable smoothing model for 1 150
the given time series and predict 2 190
the sales in months 17 and 18. 3 160
4 240
5 220
6 270
7 290
8 280
9 340
10 330
11 290
12 380
13 350
14 440
15 520
16 550
Holt’s method: Example

• From a plot of the sales data, we can clearly see


the presence of an increasing trend and no
discernible seasonal variations.

• Hence, it is appropriate to use Holt’s method for


this problem.

• To initialize the level and trend estimates, we use


𝐿1 = 𝑦1 = 150 and 𝑏1 = 0.

• The smoothing parameters are assigned the


following values: 𝛼 = 0.3, 𝛽 = 0.1.
Holt’s method: Results
Here, 𝛼 = 0.3, 𝛽 = 0.1
𝑚 = 1 (unless otherwise stated)
Month, 𝒕 Sales, 𝒚𝒕 𝑳𝒕 𝒃𝒕 𝑭𝒕+𝒎
1 150 150.00 0.00
2 190 162.00 1.20 150.00
3 160 162.24 1.10 163.20
4 240 186.34 3.40 163.34
5 220 198.82 4.31 189.74
6 270 223.19 6.32 203.13
7 290 247.66 8.13 229.51
8 280 263.05 8.86 255.79
9 340 292.34 10.90 271.91
10 330 311.27 11.70 303.24
11 290 313.08 10.71 322.97
12 380 340.66 12.40 323.79
13 350 352.14 12.31 353.06
14 440 387.11 14.58 364.45
15 520 437.18 18.13 401.69
16 550 483.72 20.97 455.31 𝒎
17 504.69 1
18 525.66 2
Sample Calculations
• Level estimate, 𝐿𝑡 = 𝛼𝑦𝑡 + 1 − 𝛼 𝐿𝑡−1 + 𝑏𝑡−1

𝐿2 = 0.3 ∗ 190 + 1 − 0.3 150 + 0 = 𝟏𝟔𝟐

• Trend estimate, 𝑏𝑡 = 𝛽 𝐿𝑡 − 𝐿𝑡−1 + (1 − 𝛽)𝑏𝑡−1

𝑏2 = 0.1 ∗ 162 − 150 + 1 − 0.1 ∗ 0 = 𝟏. 𝟐

• Forecast made at the end of period 𝑡 for period 𝑡 + 𝑚,

𝐹𝑡+𝑚 = 𝐿𝑡 + 𝑚𝑏𝑡

• Forecast made at the end of period 2 for period 3 (At 𝑡 = 2, 𝑚 = 1)

𝐹3 = 𝐹2+1 = 𝐿2 + 1 ∗ 𝑏2 = 162 + 1 ∗ 1.2 = 𝟏𝟔𝟑. 𝟐


Sample Calculations

Forecast for months 17 and 18:

• Forecast made at the end of period 16 for month 17, (At


𝑡 = 16, 𝑚 = 1)

𝐹17 = 𝐹16+1 = 𝐿16 + 1 ∗ 𝑏16 = 483.72 + 1 ∗ 20.97 = 𝟓𝟎𝟒. 𝟔𝟗

• Forecast made at the end of period 16 for period 18 (At


𝑡 = 16, 𝑚 = 2)

𝐹18 = 𝐹16+2 = 𝐿16 + 2 ∗ 𝑏16 = 483.72 + 2 ∗ 20.97 = 𝟓𝟐𝟓. 𝟔𝟔


Holt-Winters’ Method (Triple
Exponential Smoothing)
➢ In 1960, Peter Winters, a student of professor Holt,
further extended Holt’s method by introducing an
additional smoothing equation for seasonality. This
seasonality may be multiplicative or additive.

➢ Therefore, Holt-Winters’ method has three smoothing


equations, one each for estimating the level, trend and
seasonal component (hence the term ‘triple’
exponential smoothing).

➢ The three smoothing parameters, 𝛼 (level), 𝛽 (trend) and


𝛾 (seasonal component) control how responsive the
forecasts are to recent observations.
Holt-Winters’ Method
Notations:
• 𝑦𝑡 - Actual observation in period 𝑡
• 𝐿𝑡 - Level estimate for period 𝑡
• 𝑏𝑡 - Trend estimate for period 𝑡
• 𝑆𝑡 - Seasonal Component estimate for period 𝑡
• 𝛼 - Smoothing Parameter for the Level of the series
• 𝛽 - Smoothing Parameter for Trend
• 𝛾 - Smoothing Parameter for the Seasonal Component
• 𝑝 - Number of seasons in one complete seasonal cycle
• 𝐹𝑡+𝑚 - Forecast made at the end of period 𝑡 for period 𝑡 + 𝑚
• 𝑚 - Number of periods to be forecast into the future
Holt-Winters’ Method
Multiplicative Seasonality:
• This model has four equations:
➢ Level estimate:
𝒚𝒕
𝑳𝒕 = 𝜶 + (𝟏 − 𝜶)(𝑳𝒕−𝟏 + 𝒃𝒕−𝟏 )
𝑺𝒕−𝒑
➢ Trend estimate:
𝒃𝒕 = 𝜷 𝑳𝒕 − 𝑳𝒕−𝟏 + (𝟏 − 𝜷)𝒃𝒕−𝟏
➢ Seasonal Component estimate:
𝒚𝒕
𝑺𝒕 = 𝜸 + (𝟏 − 𝜸)𝑺𝒕−𝒑
𝑳𝒕
➢ Forecast made at the end of period 𝑡 for period 𝑡 + 𝑚:
𝑭𝒕+𝒎 = 𝑳𝒕 + 𝒎𝒃𝒕 𝑺𝒕+𝒎−𝒑
Holt-Winters’ Method – Multiplicative Seasonality
➢ To obtain initial estimates of the level (𝐿𝑝 ), trend (𝑏𝑝 ) and
seasonal components ({𝑆1 , 𝑆2 , 𝑆3 , … , 𝑆𝑝 }), we need at least one
complete season's data (i.e. 𝑝 periods).
➢ Initialize level as:
1
𝐿𝑝 = 𝑦1 + 𝑦2 + 𝑦3 + ⋯ + 𝑦𝑝
𝑝
➢ Initialize trend as:
1 (𝑦𝑝+1 − 𝑦1 ) (𝑦𝑝+2 − 𝑦2 ) (𝑦𝑝+3 − 𝑦3 ) (𝑦𝑝+𝑝 − 𝑦𝑝 )
𝑏𝑝 = + + + ⋯+
𝑝 𝑝 𝑝 𝑝 𝑝

➢ Initialize seasonal components as:


𝑦1 𝑦2 𝑦3 𝑦𝑝
𝑆1 = ; 𝑆2 = ; 𝑆3 = ; … ; 𝑆𝑝 =
𝐿𝑝 𝐿𝑝 𝐿𝑝 𝐿𝑝
Example: Quarterly sales of a company
Year Quarter Sales
2014 1 500
2 350
3 250
4 400
2015 1 450
Fit a Holt-Winters’ 2 350
3 200
model for the following 4 300
data assuming 2016 1
2
350
200
multiplicative 3 150
4 400
seasonality and predict 2017 1 550
2 350
the sales in the next 4 3 250
4 550
quarters. 2018 1 550
2 400
3 350
4 600
2019 1 750
2 500
3 400
4 650
2020 1 850
2 600
3 450
4 700
Example: Quarterly sales of a company
➢ Examination of the plot
shows:
➢ A non-stationary time series
data with a decreasing trend for
the first 3 years and a
subsequent increasing trend.
➢ Volume of sales seems to vary
according to a four-seasonal
pattern.
➢In every year, the maximum
and minimum sales are
observed in Q1 and Q3,
respectively.
Initialization
• We assign 𝜶 = 𝟎. 𝟒, 𝜷 = 𝟎. 𝟏 and 𝜸 = 𝟎. 𝟑. Here, 𝒑 = 𝟒.
• Initialization of Level (𝐿𝑝 ) and Trend (𝑏𝑝 ):

𝑦1 + 𝑦2 + 𝑦3 + 𝑦4 500 + 350 + 250 + 400


𝐿4 = =
4 4

𝑳𝟒 = 𝟑𝟕𝟓

1 (𝑦5 − 𝑦1 ) (𝑦6 − 𝑦2 ) (𝑦7 − 𝑦3 ) (𝑦8 − 𝑦4 )


𝑏4 = + + +
4 4 4 4 4

1 (450 − 500) (350 − 350) (200 − 250) (300 − 400)


𝑏4 = + + +
4 4 4 4 4

𝒃𝟒 = −𝟏𝟐. 𝟓
Initialization
• Initialization of Seasonal Component estimates:

𝑦1 500
𝑆1 = = = 𝟏. 𝟑𝟑𝟑
𝐿4 375

𝑦2 350
𝑆2 = = = 𝟎. 𝟗𝟑𝟑
𝐿4 375

𝑦3 250
𝑆3 = = = 𝟎. 𝟔𝟔𝟕
𝐿4 375

𝑦4 400
𝑆4 = = = 𝟏. 𝟎𝟔𝟕
𝐿4 375
Here, 𝛼 = 0.4, 𝛽 = 0.1, 𝛾 = 0.3, 𝑝 = 4
𝑚 = 1 (unless otherwise stated)
Results
Year Quarter 𝒕 Sales, 𝒚𝒕 𝑳𝒕 𝒃𝒕 𝑺𝒕 𝑭𝒕+𝒎
2014 1 1 500 1.333
2 2 350 0.933
3 3 250 0.667
4 4 400 375 -12.50 1.067
2015 1 5 450 352.50 -13.50 1.316 483.33
2 6 350 353.40 -12.06 0.950 316.40
3 7 200 324.80 -13.71 0.651 227.56
4 8 300 299.15 -14.91 1.048 331.83
2016 1 9 350 276.91 -15.64 1.301 374.16
2 10 200 240.93 -17.67 0.914 248.32
3 11 150 226.06 -17.39 0.655 145.43
4 12 400 277.94 -10.47 1.165 218.58
2017 1 13 550 329.64 -4.25 1.411 347.88
2 14 350 348.35 -1.95 0.941 297.52
3 15 250 360.50 -0.54 0.667 226.90
4 16 550 404.81 3.94 1.223 419.35
Year Quarter 𝒕 Sales, 𝒚𝒕 𝑳𝒕 𝒃𝒕 𝑺𝒕 𝑭𝒕+𝒎

2018 1 17 550 401.17 3.18 1.399 576.74


2 18 400 412.56 4.00 0.950 380.69
3 19 350 459.97 8.34 0.695 277.67
4 20 600 477.21 9.23 1.233 572.79
2019 1 21 750 506.31 11.22 1.424 680.53
2 22 500 521.07 11.57 0.953 491.60
3 23 400 549.84 13.29 0.705 370.12
4 24 650 548.69 11.85 1.219 694.55
2020 1 25 850 575.14 13.31 1.440 798.03
2 26 600 604.96 14.96 0.964 560.67
3 27 450 627.40 15.71 0.708 436.83
4 28 700 615.61 12.96 1.194 783.78 𝒎
2021 1 29 905.10 1
2 30 618.75 2
3 31 463.66 3
4 32 797.09 4
Sample Calculations
• Forecast made at the end of period 4 for period 5 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 4, 𝑚 = 1)

𝐹5 = 𝐹4+1 = (𝐿4 + 1 ∗ 𝑏4 )𝑆1

𝐹5 = 375 + 1 ∗ −12.5 (1.333) = 𝟒𝟖𝟑. 𝟑𝟑

• Level estimate for period 5 (𝐿5 ):

𝑦5
𝐿5 = 𝛼 + (1 − 𝛼)(𝐿4 + 𝑏4 )
𝑆1

450
𝐿5 = (0.4) + 1 − 0.4 375 + −12.5 = 𝟑𝟓𝟐. 𝟓
1.333
Sample Calculations
• Seasonal Component estimate for period 5 (𝑆5 ):

𝑦5
𝑆5 = 𝛾 + (1 − 𝛾)𝑆1
𝐿5

450
𝑆5 = (0.3) + 1 − 0.3 (1.333) = 𝟏. 𝟑𝟏𝟔
352.5

• Trend estimate for period 5 (𝑏5 ):

𝑏5 = 𝛽 𝐿5 − 𝐿4 + (1 − 𝛽)𝑏4

𝑏5 = (0.1) 352.5 − 375 + 1 − 0.1 −12.5 = −𝟏𝟑. 𝟓


Sample Calculations

• Forecast made at the end of period 5 for period 6


(𝐹𝑡+𝑚 ): (Here, 𝑡 = 5, 𝑚 = 1)

𝐹6 = 𝐹5+1 = (𝐿5 + 1 ∗ 𝑏5 )𝑆2

𝐹6 = 352.5 + 1 ∗ −13.5 (0.933)

𝐹6 = 𝟑𝟏𝟔. 𝟒
Forecast for periods 29-32
• Forecast made at the end of period 28 for period 29 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 28, 𝑚 = 1)

𝐹29 = 𝐹28+1 = (𝐿28 + 1 ∗ 𝑏28 )𝑆25

𝐹29 = 615.61 + (1)(12.96) (1.440) = 𝟗𝟎𝟓. 𝟏𝟎

• Forecast made at the end of period 28 for period 30 (𝐹𝑡+𝑚 ): (Here,


𝑡 = 28, 𝑚 = 2)

𝐹30 = 𝐹28+2 = (𝐿28 + 2 ∗ 𝑏28 )𝑆26

𝐹30 = 615.61 + 2 12.96 0.964 = 𝟔𝟏𝟖. 𝟕𝟓


Forecast for periods 29-32
• Forecast made at the end of period 28 for period 31 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 28, 𝑚 = 3)

𝐹31 = 𝐹28+3 = (𝐿28 + 3 ∗ 𝑏28 )𝑆27

𝐹31 = 615.61 + 3 12.96 (0.708) = 𝟒𝟔𝟑. 𝟔𝟔

• Forecast made at the end of period 28 for period 32 (𝐹𝑡+𝑚 ): (Here,


𝑡 = 28, 𝑚 = 4)

𝐹32 = 𝐹28+4 = (𝐿28 + 4 ∗ 𝑏28 )𝑆28

𝐹32 = 615.61 + 4 12.96 1.194 = 𝟕𝟗𝟕. 𝟎𝟗


Holt-Winters’ Method
Additive Seasonality:
• This model has four equations:
➢ Level estimate:
𝑳𝒕 = 𝜶(𝒚𝒕 − 𝑺𝒕−𝒑 ) + (𝟏 − 𝜶)(𝑳𝒕−𝟏 + 𝒃𝒕−𝟏 )
➢ Trend estimate:
𝒃𝒕 = 𝜷 𝑳𝒕 − 𝑳𝒕−𝟏 + (𝟏 − 𝜷)𝒃𝒕−𝟏
➢ Seasonal Component estimate:
𝑺𝒕 = 𝜸(𝒚𝒕 − 𝑳𝒕 ) + (𝟏 − 𝜸)𝑺𝒕−𝒑
➢ Forecast made at the end of period 𝑡 for period 𝑡 + 𝑚:
𝑭𝒕+𝒎 = 𝑳𝒕 + 𝒎𝒃𝒕 + 𝑺𝒕+𝒎−𝒑
Holt-Winters’ Method – Additive Seasonality
➢ To obtain initial estimates of the level (𝐿𝑝 ), trend (𝑏𝑝 ) and
seasonal components ({𝑆1 , 𝑆2 , 𝑆3 , … , 𝑆𝑝 }), we need at least one
complete season's data (i.e. 𝑝 periods).
➢ Initialize level as:
1
𝐿𝑝 = 𝑦1 + 𝑦2 + 𝑦3 + ⋯ + 𝑦𝑝
𝑝
➢ Initialize trend as:
1 (𝑦𝑝+1 − 𝑦1 ) (𝑦𝑝+2 − 𝑦2 ) (𝑦𝑝+3 − 𝑦3 ) (𝑦𝑝+𝑝 − 𝑦𝑝 )
𝑏𝑝 = + + + ⋯+
𝑝 𝑝 𝑝 𝑝 𝑝

➢ Initialize seasonal components as:


𝑆1 = 𝑦1 − 𝐿𝑝 ; 𝑆2 = 𝑦2 − 𝐿𝑝 ; 𝑆3 = 𝑦3 − 𝐿𝑝 ; … ; 𝑆𝑝 = 𝑦𝑝 − 𝐿𝑝
Example: Quarterly sales of a company
Year Quarter Sales
2014 1 500
2 350
3 250
4 400
2015 1 450
Fit a Holt-Winters’ 2 350
3 200
model for the following 4 300
data assuming additive 2016 1
2
350
200
seasonality and predict 3 150
4 400
the sales in the next 4 2017 1 550
2 350
quarters. 3 250
4 550
2018 1 550
2 400
3 350
4 600
2019 1 750
2 500
3 400
4 650
2020 1 850
2 600
3 450
4 700
Initialization
• We assign 𝜶 = 𝟎. 𝟒, 𝜷 = 𝟎. 𝟏 and 𝜸 = 𝟎. 𝟑. Here, 𝒑 = 𝟒.
• Initialization of Level (𝐿𝑝 ) and Trend (𝑏𝑝 ):

𝑦1 + 𝑦2 + 𝑦3 + 𝑦4 500 + 350 + 250 + 400


𝐿4 = =
4 4

𝑳𝟒 = 𝟑𝟕𝟓

1 (𝑦5 − 𝑦1 ) (𝑦6 − 𝑦2 ) (𝑦7 − 𝑦3 ) (𝑦8 − 𝑦4 )


𝑏4 = + + +
4 4 4 4 4

1 (450 − 500) (350 − 350) (200 − 250) (300 − 400)


𝑏4 = + + +
4 4 4 4 4

𝒃𝟒 = −𝟏𝟐. 𝟓
Initialization

• Initialization of Seasonal Component estimates:

𝑆1 = 𝑦1 − 𝐿4 = 500 − 375 = 𝟏𝟐𝟓

𝑆2 = 𝑦2 − 𝐿4 = 350 − 375 = −𝟐𝟓

𝑆3 = 𝑦3 − 𝐿4 = 250 − 375 = −𝟏𝟐𝟓

𝑆4 = 𝑦4 − 𝐿4 = 400 − 375 = 𝟐𝟓
Here, 𝛼 = 0.4, 𝛽 = 0.1, 𝛾 = 0.3, 𝑝 = 4 Results
𝑚 = 1 (unless otherwise stated)

Year Quarter 𝒕 Sales, 𝒚𝒕 𝑳𝒕 𝒃𝒕 𝑺𝒕 𝑭𝒕+𝒎


2014 1 1 500 125
2 2 350 -25
3 3 250 -125
4 4 400 375 -12.50 25
2015 1 5 450 347.50 -14.00 118.25 487.50
2 6 350 350.10 -12.34 -17.53 308.50
3 7 200 332.66 -12.85 -127.30 212.76
4 8 300 301.88 -14.64 16.93 344.81
2016 1 9 350 265.04 -16.86 108.26 405.49
2 10 200 235.92 -18.09 -23.05 230.65
3 11 150 241.62 -15.71 -116.59 90.54
4 12 400 288.77 -9.42 45.22 242.84
2017 1 13 550 344.30 -2.93 137.49 387.61
2 14 350 354.04 -1.66 -17.35 318.33
3 15 250 358.07 -1.09 -114.04 235.79
4 16 550 416.10 4.82 71.83 402.20
Year Quarter 𝒕 Sales, 𝒚𝒕 𝑳𝒕 𝒃𝒕 𝑺𝒕 𝑭𝒕+𝒎

2018 1 17 550 417.55 4.48 135.98 558.41


2 18 400 420.16 4.30 -18.19 404.69
3 19 350 440.29 5.88 -106.91 310.42
4 20 600 478.97 9.16 86.59 517.99
2019 1 21 750 538.49 14.19 158.64 624.11
2 22 500 538.88 12.82 -24.40 534.49
3 23 400 533.78 11.02 -114.97 444.79
4 24 650 552.25 11.77 89.94 631.40
2020 1 25 850 614.95 16.86 181.56 722.66
2 26 600 628.85 16.56 -25.73 607.42
3 27 450 613.24 13.35 -129.45 530.44
4 28 700 619.98 12.69 86.96 716.52 𝒎
2021 1 29 814.22 1
2 30 619.61 2
3 31 528.58 3
4 32 757.68 4
Sample Calculations
• Forecast made at the end of period 4 for period 5 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 4, 𝑚 = 1)

𝐹5 = 𝐹4+1 = 𝐿4 + 1 ∗ 𝑏4 + 𝑆1

𝐹5 = 375 + 1 ∗ −12.5 + 125 = 𝟒𝟖𝟕. 𝟓𝟎

• Level estimate for period 5 (𝐿5 ):

𝐿5 = 𝛼(𝑦5 − 𝑆1 ) + (1 − 𝛼)(𝐿4 + 𝑏4 )

𝐿5 = (0.4)(450 − 125) + 1 − 0.4 375 + −12.5 = 𝟑𝟒𝟕. 𝟓


Sample Calculations
• Seasonal Component estimate for period 5 (𝑆5 ):

𝑆5 = 𝛾(𝑦5 − 𝐿5 ) + (1 − 𝛾)𝑆1

𝑆5 = (0.3)(450 − 347.5) + 1 − 0.3 125 = 𝟏𝟏𝟖. 𝟐𝟓

• Trend estimate for period 5 (𝑏5 ):

𝑏5 = 𝛽 𝐿5 − 𝐿4 + (1 − 𝛽)𝑏4

𝑏5 = (0.1) 347.5 − 375 + 1 − 0.1 −12.5 = −𝟏𝟒


Sample Calculations

• Forecast made at the end of period 5 for period 6


(𝐹𝑡+𝑚 ): (Here, 𝑡 = 5, 𝑚 = 1)

𝐹6 = 𝐹5+1 = 𝐿5 + 1 ∗ 𝑏5 + 𝑆2

𝐹6 = 347.5 + 1 ∗ −14 + (−25)

𝐹6 = 𝟑𝟎𝟖. 𝟓
Forecast for periods 29-32
• Forecast made at the end of period 28 for period 29 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 28, 𝑚 = 1)

𝐹29 = 𝐹28+1 = 𝐿28 + 1 ∗ 𝑏28 + 𝑆25

𝐹29 = 619.98 + (1) 12.69 + 181.56 = 𝟖𝟏𝟒. 𝟐𝟐

• Forecast made at the end of period 28 for period 30 (𝐹𝑡+𝑚 ): (Here,


𝑡 = 28, 𝑚 = 2)

𝐹30 = 𝐹28+2 = 𝐿28 + 2 ∗ 𝑏28 + 𝑆26

𝐹30 = 619.98 + 2 12.69 + (−25.73) = 𝟔𝟏𝟗. 𝟔𝟏


Forecast for periods 29-32
• Forecast made at the end of period 28 for period 31 (𝐹𝑡+𝑚 ): (Here,
𝑡 = 28, 𝑚 = 3)

𝐹31 = 𝐹28+3 = 𝐿28 + 3 ∗ 𝑏28 + 𝑆27

𝐹31 = 619.98 + (3) 12.69 + (−129.45) = 𝟓𝟐𝟖. 𝟓𝟖

• Forecast made at the end of period 28 for period 32 (𝐹𝑡+𝑚 ): (Here,


𝑡 = 28, 𝑚 = 4)

𝐹32 = 𝐹28+4 = 𝐿28 + 4 ∗ 𝑏28 + 𝑆28

𝐹32 = 619.98 + 4 12.69 + 86.96 = 𝟕𝟓𝟕. 𝟔𝟖


Regression-based Models
➢ Regression is defined as a functional relationship between two or
more correlated variables.

➢ Such functional relationships are developed from observed data, and


they can then be used to predict one variable given the other
variable(s).

➢ Regression-based models are also referred to as associative models


or causal models.

➢ Regression-based models can be broadly classified as follows:


➢ Linear Models
➢ Simple Linear Regression (Only one predictor)
➢ Multiple Linear Regression (Two or more predictors)
➢ Non-Linear Models
Linear Regression
➢ In linear regression, the functional relationships between
variables are modelled using linear predictor functions.

➢ The most general linear regression model used in forecasting


takes the following form:

𝑌෠ 𝑋 = 𝑎0 + 𝑎1 𝑋1 + 𝑎2 𝑋2 + ⋯ + 𝑎𝑛−1 𝑋𝑛−1 + 𝑎𝑛 𝑋𝑛

➢ Here, 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 are called the predictor (independent)


variables and 𝒀 ෡ is called the forecast (dependent /
response) variable.

➢ The unknown model parameters 𝑎0 , 𝑎1 , 𝑎2 , … , 𝑎𝑛 can be


determined by minimizing an error function (usually the sum of
squared errors).
Linear Regression
➢ A special case of the general linear regression model is the
polynomial of the form:

𝑌෠ 𝑋 = 𝑎0 + 𝑎1 𝑋 + 𝑎2 𝑋2 + ⋯ + 𝑎𝑛−1 𝑋 𝑛−1 + 𝑎𝑛 𝑋𝑛

➢ Even though the relationship between the dependent and


independent variable is non-linear, the above equation is still
linear in terms of the parameters 𝑎0 , 𝑎1 , 𝑎2 , … , 𝑎𝑛 .

➢ Truncating later terms in the above expression, we get


𝑌෠ 𝑋 = 𝑎0 : Constant Model
෡ 𝑿 = 𝒂𝟎 + 𝒂𝟏 𝑿 : Simple Linear Model with Slope 𝒂𝟏
𝒀
𝑌෠ 𝑋 = 𝑎0 + 𝑎1 𝑋 + 𝑎2 𝑋2 : Quadratic model
Non-Linear Regression
➢ In non-linear regression, observational data are modelled by a
function which is a nonlinear combination of the model parameters.

➢ Here, determining the model parameters is much harder than in the


case of linear regression, and often only approximate iterative
procedures are available.

➢ However, in some cases, a suitable variable transformation can


make the model linear.

➢ For example, consider an Exponential Model


➢ 𝑌෠ 𝑋 = 𝑎𝑒 𝑏𝑋
A logarithmic transformation gives:
➢ ln 𝑌෠ 𝑋 = ln 𝑎 + ln(𝑒 𝑏𝑋 )
➢ ln[𝑌෠ 𝑋 ] = ln 𝑎 + 𝑏𝑋, which is linear in terms of ln 𝑎 and 𝑏
Method of Ordinary Least Squares (OLS) for
Linear Regression
➢ Let 𝐗 𝑖 = 𝑋𝑖1 𝑋𝑖2 𝑋𝑖3 ⋯ 𝑋𝑖𝑘 be a vector of values of 𝑘
predictor variables for the 𝑖 𝑡ℎ observation among 𝑛 observations.

➢ Let 𝑌෠𝑖 = 𝑓(𝐗 𝑖 ) be the predicted response and 𝑌𝑖 be the actual


response for the 𝑖 𝑡ℎ observation, respectively.

➢ Using the sum of deviations, i.e., σ𝑛𝑖=1 𝑌𝑖 − 𝑌෠𝑖 could lead to


incorrect inferences about the accuracy of forecasts due to the
canceling out of positive and negative errors.

➢ So we take the square of the deviations, find their sum, i.e.,


2
σ𝑛𝑖=1 𝑌𝑖 − 𝑌෠𝑖 and minimize it to determine the predictor
coefficients.
Method of Ordinary Least Squares (OLS)
Y6
Y

Y2 Y3
Response

Y4 Y5

Y1

X
X1 X2 X3 X4 X5 X6
Predictor
Method of Ordinary Least Squares (OLS)

Consider a simple linear regression model.

Let 𝑌𝑖 be the actual response and 𝑌෠𝑖 be the predicted


response.

Let 𝑒𝑖 be the error for the 𝑖𝑡ℎ period.

𝑒𝑖 = 𝑌𝑖 − 𝑌෠𝑖
෠𝑖 = 𝑎 + 𝑏𝑋𝑖
But, 𝑌
Therefore, 𝑒𝑖 = 𝑌𝑖 − 𝑎 − 𝑏𝑋𝑖
Let 𝐸(𝑎, 𝑏) be the sum of the squared errors.
𝑛 𝑛
2
𝐸 𝑎, 𝑏 = ෍ 𝑌𝑖 − 𝑌෠𝑖 = ෍ 𝑌𝑖 − 𝑎 − 𝑏𝑋𝑖 2

𝑖=1 𝑖=1

Find a, b such that E is minimum.



It can be shown that the resulting least squares
estimators b and a are

𝑛 σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − σ𝑛𝑖=1 𝑋𝑖 σ𝑛𝑖=1 𝑌𝑖 σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − 𝑛𝑋ത 𝑌ത


𝑏= = 𝑛
𝑛 σ𝑛𝑖=1 𝑋𝑖2 − 𝑛
σ𝑖=1 𝑋𝑖
2 σ𝑖=1 𝑋𝑖2 − 𝑛𝑋ത 2

σ𝑛𝑖=1 𝑌𝑖 − 𝑏 σ𝑛𝑖=1 𝑋𝑖
𝑎= = 𝑌ത − 𝑏𝑋ത
𝑛
The formula for 𝑏 can also be represented in
alternative ways, such as

𝐸 (𝑋 − 𝐸 𝑋 )(𝑌 − 𝐸 𝑌 ) 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸(𝑌)
𝑏= 2
=
𝐸(𝑋 − [𝐸 𝑋 ]) 𝐸 𝑋 2 − [𝐸 𝑋 ]2

𝐶𝑜𝑣(𝑋, 𝑌)
𝑏=
𝑉𝑎𝑟(𝑋)

where, 𝐶𝑜𝑣(𝑋, 𝑌) is the co-variance of 𝑋 and 𝑌


If we normalize the scale such that
𝑛

෍ 𝑋𝑖 = 0
𝑖=1

then
σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖
𝑏= 𝑛
σ𝑖=1 𝑋𝑖2
and
σ𝑛𝑖=1 𝑌𝑖
𝑎= = 𝑌ത
𝑛
Correlation Coefficient
• Correlation (Dependence): It is the existence of a
statistical (not necessarily causal) relationship between a
pair of variables.

• Pearson Correlation Coefficient (𝒓𝑿𝒀 ): It is a relative


measure of the degree of linear relationship between two
variables (𝑋 and 𝑌).
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟𝑋𝑌 =
𝜎𝑋 𝜎𝑌

• Where 𝐶𝑜𝑣(𝑋, 𝑌) is the co-variance of 𝑋 and 𝑌, and 𝜎𝑋 and


𝜎𝑌 are the standard deviations of 𝑋 and 𝑌 respectively.
• 𝑟𝑋𝑌 always lies in the range [−1,1], with ±1 indicating
perfect correlation and 0 indicating no correlation.
Co-efficient of Determination
• Coefficient of Determination (𝑹𝟐 ): It is the proportion of
total variation in the response that is explained by the
regression line.

2
σ𝑛𝑖=1 𝑌෠𝑖 − 𝑌ത
𝑅2 =
σ𝑛𝑖=1 𝑌𝑖 − 𝑌ത 2

• It lies in the range [0,1], with a value of 0 indicating that


the regression line does not explain any variation in the
response, and a value of 1 indicating that the regression
line fully explains the variations in the response.
Example
Sales data for a Year Demand
certain item is 2009 2
given for last 11 2010 3
years. 2011 6
2012 10
Fit regression line and 2013 8

make a forecast for the 2014 7


2015 12
year 2020.
2016 14
2017 14
Taking year 2008 as
2018 18
year zero, the table is
2019 19
constructed as follows:
Determination of parameters
Least Squares Fit
Year X Y XY X2
2009 1 2 2 1
2010 2 3 6 4
2011 3 6 18 9
2012 4 10 40 16
2013 5 8 40 25
2014 6 7 42 36
2015 7 12 84 49
2016 8 14 112 64
2017 9 14 126 81
2018 10 18 180 100
2019 11 19 209 121
𝟏𝟏 𝟏𝟏 𝟏𝟏 𝟏𝟏
෍ 𝑿𝒊 = 𝟔𝟔 ෍ 𝒀𝒊 = 𝟏𝟏𝟑 ෍ 𝑿𝒊 𝒀𝒊 = 𝟖𝟓𝟗 ෍ 𝑿𝟐𝒊 = 𝟓𝟎𝟔
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏 𝒊=𝟏
σ𝑛
𝑖=1 𝑋𝑖 66 σ𝑛
𝑖=1 𝑌𝑖 113
𝑋ത = 𝐸 𝑋 = = = 𝟔; 𝑌ത = 𝐸 𝑌 = = = 𝟏𝟎. 𝟐𝟕𝟑
𝑛 11 𝑛 11

σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − 𝑛𝑋ത 𝑌ത 859 − 11 ∗ 6 ∗ 10.273


𝑏= 𝑛 = = 𝟏. 𝟔𝟒𝟓
2
σ𝑖=1 𝑋𝑖 − 𝑛𝑋 ത 2 506 − 11 ∗ 6 2

𝑎 = 𝑌ത − 𝑏𝑋ത = 10.273 − 1.645 ∗ 6 = 𝟎. 𝟒


Forecasting equation is:
𝑌෠𝑋 = 𝑎 + 𝑏𝑋 = 𝟎. 𝟒 + 𝟏. 𝟔𝟒𝟓 ∗ 𝑿

For year 2020, (𝑋 = 2020 − 2008 = 12), the forecast is:


𝑌෠12 = 0.4 + 1.645 ∗ 12 = 𝟐𝟎. 𝟏𝟓 𝑢𝑛𝑖𝑡𝑠
The correlation coefficient is given by:

𝐶𝑜𝑣(𝑋, 𝑌) 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸(𝑌)
𝑟𝑋𝑌 = =
𝜎𝑋 𝜎𝑌 𝐸 𝑋 2 − [𝐸 𝑋 ]2 𝐸 𝑌 2 − [𝐸 𝑌 ]2

78.091 − 6 ∗ 10.273
𝑟𝑋𝑌 = = 𝟎. 𝟗𝟔𝟏𝟓
46 − [6]2 134.818 − [10.273]2

The coefficient of determination is given by:


𝑛 2
σ 𝑖=1

𝑌𝑖 − ത
𝑌 σ 𝑛
𝑖=1 𝑎 + 𝑏𝑋𝑖 − 𝑌
ത 2 27.075
2
𝑅 = 𝑛 = =
σ𝑖=1 𝑌𝑖 − 𝑌ത 2 σ𝑛𝑖=1 𝑌𝑖 − 𝑌ത 2 29.289

𝑹𝟐 = 𝟎. 𝟗𝟐𝟒𝟒
Regression Analysis
• A simple regression model (𝑌෠ = 𝑎 + 𝑏𝑋)
developed for making forecasts must suitably
answer the following questions:

• Is the coefficient (𝑏) of the predictor variable 𝑋


significantly different from 0?

• How confident can we be of the estimates of the


regression coefficient 𝑏?

• How confident can we be that the actual value (𝑌)


of the response variable will lie within a small

range of the forecasted value (𝑌)?
Regression Analysis
• The first question addresses the significance of the
regression coefficient 𝑏, specifically whether the
true value of 𝑏 is really different from 0.

• The true value of 𝑏 is unknown and we use the


available sample data to estimate the value of 𝑏

(𝑏).

• However, since 𝑏෠ has been estimated based on a


limited number of observations, it is possible that 𝑏෠
might be non-zero purely by chance.
Regression Analysis
• To test the significance of 𝑏, we define the null
hypothesis (𝐻0 ) and alternative hypothesis (𝐻1 ) as
follows:

• 𝐻0 : 𝑏 = 0; 𝐻1 : 𝑏 ≠ 0

• Then the standard error of 𝑏 is computed.

2
σ𝑛𝑖=1 𝑌𝑖 − 𝑌෠𝑖
𝑛−𝑘−1
𝑆𝐸𝑏 =
σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2

• Here, 𝑛 is the no. of observations, 𝑘 is the no. of


predictors (k=1 for simple regression)
Regression Analysis
• The estimate of 𝑏 (𝑏)෠ is divided by this standard
error to obtain the test statistic (𝑡).

• This test statistic is then compared with the


critical value (𝑡𝛼,𝑛−2 ) of the Student’s t-
2
distribution at a given level of significance (𝛼).

• If 𝑡 is greater than 𝑡𝛼,𝑛−2 , the null hypothesis


2
can be rejected and it can be concluded that 𝑏
is significantly different from zero.
Example
• Considering the regression model for the
previous sales example: 𝑌෠𝑋 = 0.4 + 1.645𝑋

24.355
11 − 2
𝑆𝐸𝑏 = = 0.1568
110

𝑏෠ 1.645
𝑡= = = 10.491
𝑆𝐸𝑏 0.1568
• At 𝛼 = 0.05,
𝑡𝑐 = 𝑡0.025,9 = 2.262
• Since 𝑡 > 𝑡𝑐 , we can conclude that 𝑏 is significantly
different from 0.
Regression Analysis
• The second question is concerned with the
precision with which the regression coefficient 𝑏
can be estimated.

• Again, since the true value of 𝑏 is unknown, we


෠ calculated based on
use an estimate of 𝑏 (𝑏)
sample observations to fit the regression
equation.

• The probability that the true value of 𝑏 will be


exactly equal to 𝑏෠ is zero.
Regression Analysis
• However, it is statistically possible to predict
(with a degree of confidence) that the true
value of 𝑏 will lie in a small interval around the
estimated value 𝑏. ෠

• This interval around the estimated value at a


given confidence level 𝐶 (𝐶 = 1 − 𝛼) is called
the confidence interval (CI).

• Lower limit of CI: 𝐿𝐿𝑏 = 𝑏෠ − 𝑡𝛼,𝑛−2 𝑆𝐸𝑏


2
• Upper limit of CI: 𝑈𝐿𝑏 = 𝑏෠ + 𝑡𝛼,𝑛−2 𝑆𝐸𝑏
2
Regression Analysis
• Considering the same example, at 𝐶 = 0.95 (𝛼 =
0.05)

• Lower limit of CI: 𝐿𝐿𝑏 = 1.645 – 2.262 ∗ 0.1568


𝐿𝐿𝑏 = 1.291

• Upper limit of CI: 𝑈𝐿𝑏 = 1.645 + 2.262 ∗ 0.1568


𝑈𝐿𝑏 = 2.000

• Thus we can be 95% certain that the true value


of 𝑏 will lie between 1.291 and 2.000.
Regression Analysis
• The third question is concerned with the
precision of the forecasts made using the
regression model.

• In other words, we want to know how confident


we can be that the true value of the response
variable (𝑌𝑋 ′ ) will lie within a small interval
around the forecasted value (𝑌෠ 𝑋 ′ = 𝑎 + 𝑏𝑋′) for
a given period (𝑋′).

• That is, we would like to have a confidence


interval around this forecasted value (𝑌෠ 𝑋 ′ ).
Regression Analysis
• To calculate the confidence interval for 𝑌𝑋 ′ , it is
first necessary to determine the standard error
of forecast (𝑆𝐸𝐹𝑋 ′ ).

σ𝑛 ෠ 2
𝑖=1 𝑌𝑖 −𝑌𝑖 1 𝑋 ′ −𝑋ത 2
• 𝑆𝐸𝐹𝑋 ′ = 1+ + σ𝑛 ത 2
𝑛−𝑘−1 𝑛 𝑖=1 𝑋𝑖 −𝑋

• Where, 𝑋′ is the value of the predictor variable


at which the forecast is to be made,
• 𝑘 is the no. of predictor variables (𝑘 = 1 for
simple regression)
Regression Analysis
• If 𝑛 is large and the values of 𝑋𝑖 are not close to
the mean 𝑋, ത the second term in 𝑆𝐸𝐹𝑋 ′ becomes
approximately equal to 1 and can be neglected.

• The confidence interval limits are then


calculated as:

• Lower limit of CI: 𝐿𝐿𝑌 = 𝑌෠ 𝑋 ′ − 𝑡𝛼,𝑛−2 𝑆𝐸𝐹𝑋 ′


𝑋′ 2
• Upper limit of CI: 𝑈𝐿𝑌 = 𝑌෠ 𝑋 ′ + 𝑡𝛼,𝑛−2 𝑆𝐸𝐹𝑋 ′
𝑋′ 2
Regression Analysis
• Considering the previous example, in year 2020
(𝑋′ = 2020 − 2008 = 12), 𝑆𝐸𝐹𝑋 ′ is calculated as:

24.355 1 12 − 6 2
𝑆𝐸𝐹12 = 1+ + = 1.959
11 − 2 11 110

• The forecasted value for year 2020 (𝑋′ = 12) is


given by:
𝑌෠ 𝑋 ′ = 0.4 + 1.645𝑋′
𝑌෠12 = 0.4 + 1.645 ∗ 12 = 𝟐𝟎. 𝟏𝟓
Regression Analysis
• At 𝐶 = 0.95 (𝛼 = 0.05),
• Lower limit of CI: 𝐿𝐿𝑌 ′ = 20.15 – 2.262 ∗ 1.959
𝑋
𝐿𝐿𝑌 ′ = 15.71
𝑋

• Upper limit of CI: 𝑈𝐿𝑌 ′ = 20.15 + 2.262 ∗ 1.959


𝑋
𝑈𝐿𝑌 ′ = 24.58
𝑋

• Thus we can be 95% certain that the actual


sales for the year 2020 will lie between 15.71
and 24.58.
Regression Analysis
• The variation in the ranges of confidence of
individual forecasts
Example
Yield
Temp (°C)
(mol/kmol)
• Consider the following data for the 20 102
yield of a chemical reaction measured 22 111
at different temperatures. Fit a 24 120
suitable regression model, estimate its 26 129
parameters and predict the yield at 28 133
30 149
60℃.
32 161
34 182
36 200
38 215
40 243
42 273
44 292
46 324
48 359
50 396
52 449
54 479
56 524
58 573
Solution
• From a plot of the observed data, we can see a
quadratic relationship between yield and temperature.
600

550

500

450

400
Yield (mol/kmol)

350

300

250

200

150

100
20 30 40 50 60
Temperature (°C)
Solution
• Fitting a quadratic regression model to the data and
applying OLS method to determine the coefficients,
the best-fit regression equation is found to be:
෣ 𝑻𝒆𝒎𝒑 = 𝟐𝟏𝟖. 𝟒𝟑𝟔 − 𝟏𝟏. 𝟓𝟎𝟒 ∗ 𝑻𝒆𝒎𝒑 + 𝟎. 𝟑𝟎𝟑 ∗ 𝑻𝒆𝒎𝒑𝟐
𝒀𝒊𝒆𝒍𝒅

• Predicted yield at 60℃ is given by:


෣ 60℃ = 218.436 − 11.504 ∗ 60 + 0.303 ∗ 602
𝑌𝑖𝑒𝑙𝑑
෣ 𝟔𝟎℃ = 𝟔𝟏𝟖. 𝟒𝟖 𝐦𝐨𝐥/𝐤𝐦𝐨𝐥
𝒀𝒊𝒆𝒍𝒅
Variable Transformation in Regression
• In some cases, the response variable may be
non-linear in terms of the regression
parameters. (e.g. 𝑌෠ = 𝑎𝑒 𝑏𝑋 )

• The OLS method will not be the best estimator


in such situations.

• However, by suitably transforming the


variables, a linear relationship can be obtained
between the transformed response variable
and the transformed parameters.
Variable Transformation in Regression

• Considering an exponential regression model,


𝑌෠ = 𝑎𝑒 𝑏𝑋

• Taking the natural logarithm on both sides, we


have

ln 𝑌෠ = ln 𝑎 + 𝑏𝑋

• The above expression is linear in terms of the


parameters and hence, OLS method can be
used to estimate the parameters (ln 𝑎 and 𝑏).
Example
• Consider the following sales data of smartphones.
Predict the sales for the 31st period.
Period Sales (in 𝟏𝟎𝟓 Period Sales (in 𝟏𝟎𝟓
(t) units) (t) units)
1 7 16 22
2 7 17 24
3 8 18 26
4 8 19 28
5 9 20 30
6 10 21 32
7 11 22 35
8 11 23 38
9 12 24 41
10 13 25 44
11 15 26 48
12 16 27 52
13 17 28 57
14 19 29 62
15 20 30 67
Example
• Plotting the sales vs time and the natural
logarithm of sales vs time, we have:

Log(Sales)
Sales (in 105 units)
4.5
80
4
70
3.5
60
3
50
2.5
40
2
30 1.5
20 1

10 0.5

0 0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Period Period
Example
• Here, the natural logarithm of the response
variable (sales) is linear in terms of the
predictor variable (time).

• Hence, an exponential regression model can


be used in this scenario.
𝑌෠ = 𝑎𝑒 𝑏𝑡

• Taking the natural logarithm on both sides, we


have
ln 𝑌෠ = ln 𝑎 + 𝑏𝑡
Example
• Using OLS method, the parameters ln(𝑎) and 𝑏 are
estimated as follows:

ln 𝑎 = 1.8087 ⇒ 𝑎 = exp 1.8087 = 6.1024


𝑏 = 0.0796

• The regression equation is: 𝑌෡𝑡 = 𝑎𝑒 𝑏𝑡


෡ 𝒕 = 𝟔. 𝟏𝟎𝟐𝟒 ∗ 𝐞𝐱𝐩(𝟎. 𝟎𝟕𝟗𝟔𝒕)
𝒀

• Forecast for the 31st period (𝑡 = 31) is:


𝑌෠31 = 6.1024 ∗ exp 0.0796 ∗ 31 = 𝟕𝟏. 𝟗𝟐 𝐱 𝟏𝟎𝟓 𝐮𝐧𝐢𝐭𝐬
෡ 𝟑𝟏 = 𝟕𝟏. 𝟗𝟐 𝐥𝐚𝐤𝐡 𝐮𝐧𝐢𝐭𝐬
(or) 𝒀
Harmonic Regression
• Harmonic regression uses a combination of periodic
functions to predict the value of a response variable.

• The most common type of periodic functions include


trigonometric functions like sine and cosine.

• Harmonic regression is especially suited to modelling


seasonality and cyclicity.

• In cycles, patterns of rise and fall may be repeated over


irregular periods of time, while seasons always
correspond to fixed, known periods of time.

• Graphically, the presence of cyclicity is characterized by


peaks at irregular intervals, while seasonality is
characterized by peaks at regular intervals.
Harmonic Regression
• Considering only the fundamental frequencies,

2𝜋 2𝜋
𝑌෠ 𝑡 = 𝐴 sin 𝑡 + 𝐵cos 𝑡
𝑝 𝑝

• Considering first order and second order frequencies,

2𝜋 2𝜋 4𝜋 4𝜋
𝑌෠ 𝑡 = 𝐴1 sin 𝑡 + 𝐵1 cos 𝑡 + 𝐴2 sin 𝑡 + 𝐵2 cos 𝑡
𝑝 𝑝 𝑝 𝑝

• Considering higher order (up to order n) frequencies,


𝑛 𝑛
2𝜋𝑖 2𝜋𝑖
𝑌෠ 𝑡 = ෍ 𝐴𝑖 sin 𝑡 + ෍ 𝐵𝑖 cos 𝑡
𝑝 𝑝
𝑖=1 𝑖=1
Harmonic Regression (Seasonality)
• Considering the same example as in Holt-Winters’ method,
the quarterly sales data for a company is tabulated below:
Year Quarter sales
2014 1 500
2 350
3 250
4 400
2015 1 450
2 350
3 200
4 300
2016 1 350
2 200
3 150
4 400
2017 1 550
2 350
3 250
4 550
2018 1 550
2 400
3 350
4 600
2019 1 750
2 500
3 400
4 650
2020 1 850
2 600
3 450
4 700
Harmonic Regression (Seasonality)

• Since the data contains seasonality and trend, assuming an


additive model, the regression equation can be written as:

2𝜋 2𝜋
𝑌෠ 𝑡 = 𝐴 + 𝐵𝑡 + 𝐶 sin 𝑡 + 𝐷 cos 𝑡
𝑝 𝑝

• Here, higher order harmonics are not considered.

• Considering the sine and cosine functions as independent


predictor variables, the model can be treated as a multiple
linear regression model with three predictors and four
coefficients.
Harmonic Regression (Seasonality)

• From a plot of the data, it can be observed that the peaks


occur roughly once every 4 observations.

• Hence the period 𝑝 is set as 4 for this example.

• The estimated regression coefficients are:

𝐴መ = 244.98; 𝐵෠ = 13.65; 𝐶መ = 152.93; 𝐷


෡ = 47.07;

• The results are tabulated in the following slide:


Year Quarter Sales t sin (pi/2 t) cos (pi/2 t) Pred
2014 1 500 1 1.0000 0.0000 411.56
2 350 2 0.0000 -1.0000 225.21
3 250 3 -1.0000 0.0000 132.99
4 400 4 0.0000 1.0000 346.64
2015 1 450 5 1.0000 0.0000 466.15
2 350 6 0.0000 -1.0000 279.79
3 200 7 -1.0000 0.0000 187.58
4 300 8 0.0000 1.0000 401.22
2016 1 350 9 1.0000 0.0000 520.73
2 200 10 0.0000 -1.0000 334.38
3 150 11 -1.0000 0.0000 242.16
4 400 12 0.0000 1.0000 455.81
2017 1 550 13 1.0000 0.0000 575.32
2 350 14 0.0000 -1.0000 388.97
3 250 15 -1.0000 0.0000 296.75
4 550 16 0.0000 1.0000 510.39
2018 1 550 17 1.0000 0.0000 629.91
2 400 18 0.0000 -1.0000 443.55
3 350 19 -1.0000 0.0000 351.33
4 600 20 0.0000 1.0000 564.98
2019 1 750 21 1.0000 0.0000 684.49
2 500 22 0.0000 -1.0000 498.14
3 400 23 -1.0000 0.0000 405.92
4 650 24 0.0000 1.0000 619.57
2020 1 850 25 1.0000 0.0000 739.08
2 600 26 0.0000 -1.0000 552.72
3 450 27 -1.0000 0.0000 460.51
4 700 28 0.0000 1.0000 674.15
Harmonic Regression (Cyclicity)
• Consider the following monthly data for paper production
in the US over a 4-year period:
Period Paper_Prod Period Paper_Prod
Jan-17 2740 Jan-19 3870
Feb-17 2805 Feb-19 3850
Mar-17 2835 Mar-19 3810
Apr-17 2840 Apr-19 3800
May-17 2895 May-19 3790
Jun-17 2905 Jun-19 3820
Jul-17 2990 Jul-19 3910
Aug-17 3070 Aug-19 3980
Sep-17 3185 Sep-19 4030
Oct-17 3275 Oct-19 4110
Nov-17 3320 Nov-19 4195
Dec-17 3305 Dec-19 4235
Jan-18 3285 Jan-20 4325
Feb-18 3255 Feb-20 4395
Mar-18 3235 Mar-20 4475
Apr-18 3225 Apr-20 4510
May-18 3260 May-20 4495
Jun-18 3345 Jun-20 4470
Jul-18 3405 Jul-20 4450
Aug-18 3595 Aug-20 4435
Sep-18 3725 Sep-20 4425
Oct-18 3790 Oct-20 4485
Nov-18 3850 Nov-20 4585
Dec-18 3875 Dec-20 4635
Harmonic Regression (Cyclicity)
• From a plot of the production data, it can be clearly seen
that the data exhibits cyclicity as well as an increasing
trend.
4700

4500

4300

4100
Paper Production

3900

3700

3500

3300

3100

2900

2700
0 5 10 15 20 25 30 35 40 45 50
Period
Harmonic Regression (example)
• Assuming an additive model and considering only the first
and second order frequencies, the regression equation
can be written as:

2𝜋 2𝜋 4𝜋 4𝜋
𝑌෠ 𝑡 = 𝐴 + 𝐵𝑡 + 𝐶 sin 𝑡 + 𝐷 cos 𝑡 + 𝐸 sin 𝑡 + 𝐹 cos 𝑡
𝑝 𝑝 𝑝 𝑝

• Here, the period 𝑝 is taken as 12 months.

• Considering the sine and cosine functions as independent


predictor variables, the model can be treated as a multiple
linear regression model with five predictors and six
coefficients.
Harmonic Regression (example)
• The estimated value of the coefficients based on method of
least squares are:

𝐴መ = 2742.29; 𝐵෠ = 40.37; 𝐶መ = −8.39;


𝐷෡ = 68.36; 𝐸෠ = −4.61; 𝐹෠ = −5.26

• The regression equation is:


2𝜋 2𝜋 4𝜋 4𝜋
𝑌෠ 𝑡 = 2742.29 + 40.37𝑡 − 8.39 sin 𝑡 + 68.36 cos 𝑡 − 4.61 sin 𝑡 − 5.26 cos 𝑡
12 12 12 12

• The forecast for month 49 (Jan 2021) is:


2𝜋 2𝜋 4𝜋 4𝜋
𝑌෠ 49 = 2742.29 + 40.37 ∗ 49 − 8.39 sin 49 + 68.36 cos 49 − 4.61 sin 49 −5.26 cos 49
12 12 12 12

෡ 𝟒𝟗 = 𝟒𝟕𝟔𝟖. 𝟒 𝐮𝐧𝐢𝐭𝐬
𝒀
Regression vs Linear Programming Model

• Given a dataset with 𝑛 observations,

x y
Run the regression
1 17500
model on the dataset
2 19000
3 23000
to get the intercept (𝑎)
4 33000 and the slope (𝑏).
5 37250 Calculate forecasted
6 35500 value (𝑦ො = 𝑎 + 𝑏𝑥) for
7 41500 each observation.
Results with Regression
𝑎 = 12500
𝑏 = 4258.93
𝑦ො = 12500 + 4258.93𝑥
x y 𝑦ො
1 17500 16758.93
2 19000 21017.86
3 23000 25276.79
4 33000 29535.72
5 37250 33794.65
6 35500 38053.57
7 41500 42312.5
• Mean Absolute Percentage Error (MAPE)= 7.67%
Linear Programming Model
Let 𝐸𝑖 (percentage error), 𝑎 and 𝑏 (regression coefficients) be the
decision variables.
Objective:
𝑛
1
𝑀𝑖𝑛 𝑍 = ෍ 𝐸𝑖
𝑛
𝑖=1
subject to
1. 𝑦ො𝑖 = 𝑎 + 𝑏𝑥𝑖 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
2. 𝐸𝑖 ≥ (𝑦ො𝑖 − 𝑦𝑖 )/𝑦𝑖 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
3. 𝐸𝑖 ≥ (𝑦𝑖 − 𝑦ො𝑖 )/𝑦𝑖 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
𝐸𝑖 ≥ 0; ∀𝑖 ∈ {1,2,3, … , 𝑛}
𝑎, 𝑏 unrestricted in sign
Results with LP using GAMS:
𝑍 = 0.0719
𝑎 = 13500, 𝑏 = 4000
𝑦ො = 13500 + 4000𝑥

MAPE = 7.19%
MAPE obtained using Linear programming model (7.19%) is less than that
obtained using regression (7.67%).
LP model considering costs of overestimation
and underestimation
• Let 𝐶𝑖 (cost), 𝑎 and 𝑏 (regression coefficients) be the decision
variables.
• Objective∶ 𝑀𝑖𝑛 𝑍 = σ𝑛𝑖=1 𝐶𝑖
subject to
1. 𝑦ො𝑖 = 𝑎 + 𝑏𝑥𝑖 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
2. 𝐶𝑖 ≥ 𝑦ො𝑖 − 𝑦𝑖 𝑝𝑜 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
3. 𝐶𝑖 ≥ (𝑦𝑖 − 𝑦ො𝑖 )𝑝𝑢 ; ∀𝑖 ∈ {1,2,3, … , 𝑛}
𝐶𝑖 ≥ 0; ∀𝑖 ∈ 1,2,3, … , 𝑛
𝑎, 𝑏 unrestricted in sign
• Here, 𝑝𝑜 and 𝑝𝑢 are the unit costs of overestimating and
underestimating the response variable, respectively.
• Special case: 𝑝𝑜 = 𝑝𝑢 = 1. Here, the objective of minimizing the
total cost becomes equivalent to minimizing the mean absolute
deviation.
Assumptions of Linear Regression

• Linearity - Linear relationship exists between the outcome


variable and each independent variable. Scatterplots can
show whether there is a linear or curvilinear relationship.

• Normality - Linear regression assumes that the residuals


(errors) are normally distributed.

• Independence of Errors - Linear regression assumes that


the prediction errors are independent of each other.

• Homoscedasticity - Error variance is the same across


observations and does not depend on the independent
variable(s).
Heteroscedasticity

• Heteroscedasticity is considered to be present in a


regression model if the prediction errors from the
model appear to have a non-uniform variance.

• It is the absence of homoscedasticity.

• It can lead to inaccurate estimates of the standard


errors of regression coefficients and may invalidate
statistical tests of significance that require the
assumption of equal variance.
Detection of Heteroscedasticity

• The simplest way to detect heteroscedasticity is with


the help of a residual plot.

• A residual plot plots the residuals against the


predicted values.

• If heteroscedasticity is absent, the residuals will be


randomly distributed within a narrow, parallel band
centred around the mean (0) and there will be no
pattern among the errors.
Detection of Heteroscedasticity
• If heteroscedasticity is present, there will be a
consistent pattern in the residual plot.

• Additionally, in the case of multiple regression,


separate plots of residuals against each independent
variable may be constructed to identify the ones that
violate homoscedasticity.

• The more rigorous approach to detect


heteroscedasticity involves using statistical tests like
Breusch-Pagan Test, White Test and Levine Test,
which require certain assumptions to be satisfied.
Heteroscedasticity and
Homoscedasticity
• Residual Plots for a heteroscedastic and homoscedastic
data set respectively
Remedies for Heteroscedasticity

• Variable Transformation: By applying suitable


transformations to the offending variable(s), the error
variance can be made constant.

• Remediation of other assumption violations:


Sometimes, violation of the assumptions of linearity and
normality may lead to heteroscedasticity in the model. By
remedying those violations, the effect of heteroscedasticity
may disappear.

• Weighted Least Squares: Here, each observation is


“weighted” by an estimate of its variance and then OLS
method is applied on the weighted data to estimate the
parameters.
Remedies for Heteroscedasticity

• Heteroscedasticity-Consistent Standard Errors(HCSE):


• Also known as robust errors, HCSEs are estimates of standard
errors of regression coefficients that have been adjusted for
heteroscedasticity.

• Unlike the first two methods, HCSEs don’t alter the regression
coefficients and are much simpler to use.

• Four different estimates of HCSE are available, of which one is


expressed below: (For a simple regression 𝑌෠ = 𝑎 + 𝑏𝑋 with 𝑛
observations and the 𝑖 𝑡ℎ error term being 𝑢ො 𝑖 ,)

𝐻𝐶𝑆𝐸1 =
Example
• The heights and weights of a sample of 100 athletes in a sports academy are
tabulated below. Fit a linear regression model to predict the weight of an athlete from
his/her height and check for homoscedasticity assumption using graphical
inspection. Height (in Weight Height (in Weight Height (in Weight Height (in Weight
m) (kg) m) (kg) m) (kg) m) (kg)
1.73 71.3 1.69 68.3 1.62 58.9 1.87 92.4
1.85 80.0 1.70 65.1 1.81 85.4 1.83 87.2
1.80 77.5 1.73 67.9 1.66 59.9 1.84 77.7
1.87 87.7 1.74 70.1 1.84 75.1 1.73 73.5
1.89 86.1 1.71 61.8 1.60 54.2 1.66 62.1
1.60 51.9 1.68 65.5 1.89 92.7 1.60 53.5
1.73 73.0 1.70 60.6 1.80 78.3 1.66 60.3
1.68 61.5 1.86 79.8 1.81 80.6 1.80 71.1
1.71 72.4 1.85 92.2 1.59 54.3 1.89 83.0
1.69 67.3 1.66 61.5 1.71 70.3 1.85 78.7
1.84 87.7 1.84 84.3 1.87 82.8 1.77 79.7
1.63 55.3 1.74 64.5 1.77 72.7 1.87 75.5
1.72 64.2 1.78 77.4 1.85 74.2 1.74 72.0
1.71 70.3 1.68 67.1 1.60 52.6 1.84 82.9
1.67 62.4 1.64 56.7 1.75 65.0 1.67 66.4
1.60 56.8 1.66 63.0 1.87 86.8 1.61 54.9
1.75 77.1 1.85 83.6 1.86 78.2 1.77 67.0
1.79 82.1 1.63 57.6 1.62 57.6 1.63 54.6
1.90 97.4 1.61 57.5 1.80 79.3 1.63 60.5
1.64 61.7 1.76 72.0 1.67 59.9 1.86 94.0
1.68 62.8 1.83 78.8 1.83 77.1 1.77 72.7
1.81 84.0 1.68 66.5 1.85 77.3 1.90 92.3
1.60 56.9 1.84 86.7 1.90 81.0 1.67 62.6
1.84 86.1 1.64 56.7 1.76 74.0 1.72 62.4
1.78 81.0 1.67 58.5 1.65 56.3 1.62 54.4
Solution
• The regression equation is determined as:
෣ = −131.18 + 115.97 ∗ 𝐻𝑒𝑖𝑔ℎ𝑡
𝑊𝑒𝑖𝑔ℎ𝑡
• Plotting the residuals against the predicted values, we find that the error
variance is not constant, but appears to be increasing proportionally with
the predicted value (and hence with the independent variable). Thus, the
data set is found to be heteroscedastic by graphical inspection.

Error
15.0

10.0

5.0

0.0
50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0

-5.0

-10.0

-15.0
Variable Selection in Regression

• In most instances of multiple regression, the researcher has


a number of possible independent variables from which to
choose for inclusion in the regression equation.

• Sometimes the set of independent variables is exactly


specified and the regression model is essentially used in a
confirmatory approach.

• But in most instances, the researcher may choose to specify


the variables to be included in the model (by explicit
specification or using the combinatorial approach) or use the
estimation technique to pick and choose among the set of
independent variables with either sequential search or
constrained processes.
Sequential Search Methods

• Sequential search methods are general approach of


estimating the regression equation.
• A set of variables is defined followed by adding or
deleting among these variables until some overall
criterion measure is achieved.

• Two types of sequential search approaches are:

(1) stepwise estimation


(2) forward addition and backward elimination.
Stepwise Estimation
• It enables the examination of each independent variable’s
contribution to the regression model.

• The independent variable with the greatest contribution is


added first.

• Independent variables are then selected for inclusion


based on their incremental contribution over the
variable(s) already in the equation.
Flowchart of the
Stepwise Estimation
Method
Forward Addition And Backward Elimination
• These are largely trial-and-error processes for finding the best
regression estimates.
• The forward addition model is similar to the stepwise procedure in
that it builds the regression equation starting with a single
independent variable.
• The backward elimination procedure starts with a regression
equation including all the independent variables and then deletes
independent variables that do not contribute significantly.

The primary distinction of the stepwise approach from the


forward addition and backward elimination procedures is its
ability to add or delete variables at each stage.
• Once a variable is added or deleted in the forward addition or
backward elimination schemes, the action cannot be reversed at a
later stage.
• Thus, the ability of the stepwise method to add and delete makes it
the generally the more preferred method among forecasters.
Forward Addition Steps
• In the forward addition procedure for variable selection, we first
perform simple linear regression individually for every predictor-
response pair.

• Then, we select the predictor with the highest significance (least 𝑝-


value, provided it is less than the specified significance level) for
entering into the model.

• After the selected predictor is entered into the model, we perform


multiple linear regression to predict the response using every
combination of the selected predictor with the remaining predictors
to determine the next variable to enter the model.

• This process is repeated until no predictor is significant enough to


enter the model for the next stage.
Backward Elimination Steps
• In the backward elimination procedure for variable selection, we
first perform multiple linear regression considering all predictors at
once.

• Then, we select the predictor with the least significance (highest 𝑝-


value, provided it is greater than or equal to the specified
significance level) for leaving the model.

• After the selected predictor is removed from the model, we perform


multiple linear regression considering the remaining predictors to
determine the next variable to leave the model.

• This process is repeated until no predictor is insignificant enough


to leave the model at the current stage.
Example
• An automotive magazine aims to study the effect of different
factors on the mileage of a car.

• Preliminary analysis led them towards the conclusion that three


factors - engine displacement, engine power and the time it takes
to accelerate from 0 kmph to 100 kmph – were especially
important in influencing a car’s mileage.

• For their experiment, they considered 38 different models of cars


(all produced in the year 2020) and measured these four variables
for each model. These values are tabulated in the next slide.

• Explore the different variable selection techniques (consider a


significance level of 𝜶 = 𝟎. 𝟏𝟎 for a variable to enter/leave the
model) and fit a suitable regression model to predict mileage using
the input variables.
Example
Engine Engine Acceleration Engine Engine Acceleration
Mileage Mileage
displacement Power (0-100 kmph) displacement Power (0-100 kmph)
(kmpl) (kmpl)
(litres) (kW) (sec) (litres) (kW) (sec)
22.1 5.7 115.6 14.9 23.9 5.2 100.7 15.2
20.3 5.8 105.9 14.3 34.7 2.3 65.6 14.4
25.2 4.4 93.2 15 28.7 2.8 81.3 16.6
24.2 5.9 111.9 13 44.7 1.4 48.5 15.2
39.3 1.6 50.7 16.5 46 1.6 59.7 14.4
36 2.2 70.8 14.2 35.9 2 59.7 15
35.6 2 72.3 14.7 41.3 1.5 52.9 14.9
40.5 1.7 55.9 14.5 38.7 1.6 50.7 16.6
26.6 2.1 76.8 15.9 37.2 2.5 67.1 16
22.3 2.7 93.2 13.6 37.7 2.8 85.8 11.3
28.3 2 85.8 15.7 35.1 2.8 85.8 12.9
21.2 2.7 99.2 15.8 43.9 2.5 67.1 13.2
27 3.8 78.3 15.8 44.8 1.7 52.2 13.2
27.3 3.3 63.4 16.7 41.7 1.4 48.5 19.2
24.4 3.7 82 18.7 48.9 1.5 51.5 14.7
23.7 4.2 89.5 15.1 40 1.6 58.2 14.1
22.3 5 96.9 15.4 28.8 2.4 72.3 14.5
23.1 4.9 96.2 13.4 28.2 2 82 12.8
21.6 5.8 102.9 13.2 41.8 1.5 52.9 14
Forward Addition – Step 1
• Mileage vs. Displacement
Coefficients Standard Error t Stat p-value
Intercept 45.963 1.966 23.379 2.35E-23
Engine displacement (litres) -4.643 0.606 -7.666 4.48E-09

• Mileage vs. Power

Coefficients Standard Error t Stat p-value


Intercept 61.219 2.783 21.996 1.83E-22
Engine Power (kW) -0.379 0.036 -10.673 1.06E-12

• Mileage vs. Acceleration


Coefficients Standard Error t Stat p-value
Intercept 36.990 13.662 2.708 0.010
Acceleration (0-100 kmph) (sec) -0.306 0.915 -0.334 0.740

• Since Engine Power is significant (𝑝-value < 𝛼 = 0.1) and has the
lowest 𝑝-value, it is selected for entry into the model.
Forward Addition – Step 2
• Mileage vs. Power and Displacement
Coefficients Standard Error t Stat p-value
Intercept 59.876 3.412 17.546 6.38E-19
Engine Power (kW) -0.335 0.073 -4.595 5.41E-05
Engine displacement (litres) -0.683 0.989 -0.691 0.494

• Mileage vs. Power and Acceleration


Coefficients Standard Error t Stat p-value
Intercept 87.739 6.674 13.146 4.22E-15
Engine Power (kW) -0.412 0.030 -13.588 1.60E-15
Acceleration (0-100 kmph) (sec) -1.620 0.383 -4.232 1.59E-04

• Since Acceleration is significant and has the lowest 𝑝-value, it is


selected as the second variable to enter into the model.
Forward Addition – Step 3
• Mileage vs. Power, Acceleration and Displacement
Coefficients Standard Error t Stat p-value
Intercept 87.350 7.289 11.984 9.35E-14
Engine Power (kW) -0.404 0.063 -6.422 2.45E-07
Acceleration (0-100 kmph) (sec) -1.610 0.394 -4.091 2.49E-04
Engine displacement (litres) -0.120 0.833 -0.144 0.886

• Since Displacement is insignificant and it is the only remaining


predictor yet to be entered into the model, we find that no further
variable entry is possible and the process terminates.

• The final regression equation based on variable selection by


forward addition procedure is (based on the coefficients from the
last table in Step 2):

෣ = 𝟖𝟕. 𝟕𝟑𝟗 − 𝟎. 𝟒𝟏𝟐 ∗ 𝑷𝒐𝒘𝒆𝒓 − 𝟏. 𝟔𝟐𝟎 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑴𝒊𝒍𝒆𝒂𝒈𝒆
Backward Elimination
Step 1:
• Mileage vs. Displacement, Power and Acceleration
Coefficients Standard Error t Stat p-value
Intercept 87.350 7.289 11.984 9.35E-14
Engine displacement (litres) -0.120 0.833 -0.144 0.886
Engine Power (kW) -0.404 0.063 -6.422 2.45E-07
Acceleration (0-100 kmph) (sec) -1.610 0.394 -4.091 2.49E-04

• Since Displacement is insignificant and has the highest 𝑝-value, it is the first
predictor to be removed from the model.
Step 2:
• Mileage vs. Power and Acceleration
Coefficients Standard Error t Stat p-value
Intercept 87.739 6.674 13.146 4.22E-15
Engine Power (kW) -0.412 0.030 -13.588 1.60E-15
Acceleration (0-100 kmph) (sec) -1.620 0.383 -4.232 1.59E-04
• Here, both variables are significant and no further variable removal is
possible. The final regression equation is:

෣ = 𝟖𝟕. 𝟕𝟑𝟗 − 𝟎. 𝟒𝟏𝟐 ∗ 𝑷𝒐𝒘𝒆𝒓 − 𝟏. 𝟔𝟐𝟎 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑴𝒊𝒍𝒆𝒂𝒈𝒆
Stepwise Selection – Step 1
• Mileage vs. Displacement
Coefficients Standard Error t Stat p-value
Intercept 45.963 1.966 23.379 2.35E-23
Engine displacement (litres) -4.643 0.606 -7.666 4.48E-09

• Mileage vs. Power

Coefficients Standard Error t Stat p-value


Intercept 61.219 2.783 21.996 1.83E-22
Engine Power (kW) -0.379 0.036 -10.673 1.06E-12

• Mileage vs. Acceleration


Coefficients Standard Error t Stat p-value
Intercept 36.990 13.662 2.708 0.010
Acceleration (0-100 kmph) (sec) -0.306 0.915 -0.334 0.740

• Since Engine Power is significant and has the lowest 𝑝-value, it is


selected for entry into the model.
Stepwise Selection – Step 2
• Mileage vs. Power and Displacement
Coefficients Standard Error t Stat p-value
Intercept 59.876 3.412 17.546 6.38E-19
Engine Power (kW) -0.335 0.073 -4.595 5.41E-05
Engine displacement (litres) -0.683 0.989 -0.691 0.494

• Mileage vs. Power and Acceleration


Coefficients Standard Error t Stat p-value
Intercept 87.739 6.674 13.146 4.22E-15
Engine Power (kW) -0.412 0.030 -13.588 1.60E-15
Acceleration (0-100 kmph) (sec) -1.620 0.383 -4.232 1.59E-04

• Since Acceleration is significant and has the lowest 𝑝-value, it is


selected as the second variable to enter into the model.
• Now, in stepwise selection procedure, it is needed to be checked if
the entry of this new variable has led to previously entered
variables becoming insignificant. Clearly, in this case, the
previously entered variable (Power) continues to be significant. So
both variables are retained going into Step 3.
Stepwise Selection – Step 3
• Mileage vs. Power, Acceleration and Displacement
Coefficients Standard Error t Stat p-value
Intercept 87.350 7.289 11.984 9.35E-14
Engine Power (kW) -0.404 0.063 -6.422 2.45E-07
Acceleration (0-100 kmph) (sec) -1.610 0.394 -4.091 2.49E-04
Engine displacement (litres) -0.120 0.833 -0.144 0.886

• Since Displacement is insignificant and it is the only remaining


predictor yet to be entered into the model, we find that no further
variable entry is possible and the process terminates.

• The final regression equation based on variable selection by


stepwise selection procedure is (based on the coefficients from the
last table in Step 2):

෣ = 𝟖𝟕. 𝟕𝟑𝟗 − 𝟎. 𝟒𝟏𝟐 ∗ 𝑷𝒐𝒘𝒆𝒓 − 𝟏. 𝟔𝟐𝟎 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑴𝒊𝒍𝒆𝒂𝒈𝒆
Constrained Methods for Variable
Selection
• These methods are also known as regularisation methods.

• There are 2 such methods:


• Ridge regression
• LASSO regression

• Both methods add a penalty term (consisting of a function


of regression coefficients) to the loss function to be
minimized for estimating the regression parameters.

• They have the effect of ‘shrinking’ the regression


coefficients and prevent overfitting.
Ridge Regression
• It is also known as L2 regularisation.
• Consider the following multiple linear regression
equation:

• Under Ridge regression approach, the loss function is


added to by a penalty term that is the product of the
Ridge parameter (𝜆 > 0) and the sum of squares of the
regression coefficients.

• Here, 𝐸𝑟𝑟𝑜𝑟(𝑦, 𝑦)
ො is some metric of forecast error for
the model, e.g. sum of squared errors.
LASSO Regression
• LASSO stands for Least Absolute Shrinkage and
Selection Operator.
• It is also known as L1 regularisation.
• It is similar to the Ridge regression approach, the
only difference is that the penalty term includes the
sum of absolute values of regression coefficients
instead of the sum of squares of the regression
coefficients.

• It allows certain regression coefficients to be set to


zero, thereby completely eliminating insignificant
factors from the model.
Adjusted Coefficient of Determination
• Adjusted 𝑹𝟐 : One disadvantage of the coefficient of
determination (𝑅2 ) is that addition of more predictor variables
will not worsen its value.

• In other words, even if additional non-significant variables are


introduced into the regression model, the value of 𝑅2 will either
stay constant or increase, thereby encouraging overfitting.

• To overcome this drawback, the adjusted 𝑅 2 is calculated. This


metric is adjusted for the no. of predictors (𝑘) in the model and
it increases only when an added variable improves the model
more than what would be expected by chance.

(1 − 𝑅2 )(𝑛 − 1)
𝑅2 𝑎𝑑𝑗 =1−
𝑛−𝑘−1
Adjusted Coefficient of Determination
• Consider the same mileage example from before (Sample size
𝑛 = 38). The regression equation is:

෣ = 𝟖𝟕. 𝟕𝟑𝟗 − 𝟎. 𝟒𝟏𝟐 ∗ 𝑷𝒐𝒘𝒆𝒓 − 𝟏. 𝟔𝟐𝟎 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑴𝒊𝒍𝒆𝒂𝒈𝒆

• The coefficient of multiple determination for this two-predictor


model is found to be:
2
𝑅2 = 𝑅(𝑃𝑜𝑤,𝐴𝑐𝑐) = 0.8411

• The adjusted coefficient of multiple determination for this model is


calculated as:

2 1 − 𝑅2 𝑛 − 1 (1 − 0.8411)(38 − 1)
𝑅𝐴𝑑𝑗 𝑃𝑜𝑤,𝐴𝑐𝑐 =1− =1−
𝑛−𝑘−1 38 − 2 − 1

𝑹𝟐𝑨𝒅𝒋 𝑷𝒐𝒘,𝑨𝒄𝒄 = 𝟎. 𝟖𝟑𝟐𝟏


Adjusted Coefficient of Determination
• We had previously established during the variable selection
procedure that the third variable (Engine Displacement) is not
significant.

• Now, we deliberately include this variable in the model to see its


2
effect on 𝑅2 and 𝑅𝑎𝑑𝑗 . The regression equation is:

෣ = 87.350 − 0.404 ∗ 𝑃𝑜𝑤𝑒𝑟 − 1.610 ∗ 𝐴𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛 − 0.120 ∗ 𝐷𝑖𝑠𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡


𝑀𝑖𝑙𝑒𝑎𝑔𝑒

• The coefficient of multiple determination for this three-predictor


model is found to be:

2
𝑅2 = 𝑅𝑃𝑜𝑤,𝐴𝑐𝑐,𝐷𝑖𝑠𝑝 = 0.8412
Adjusted Coefficient of Determination
• We find that there is a marginal increase in the value of 𝑅2
despite adding an insignificant variable.

• Now, we compute the adjusted coefficient of multiple


determination for this three-predictor model.

2 1 − 𝑅2 𝑛 − 1 (1 − 0.8412)(38 − 1)
𝑅𝐴𝑑𝑗 𝑃𝑜𝑤,𝐴𝑐𝑐,𝐷𝑖𝑠𝑝 =1− =1−
𝑛−𝑘−1 38 − 3 − 1

𝑹𝟐𝑨𝒅𝒋 𝑷𝒐𝒘,𝑨𝒄𝒄,𝑫𝒊𝒔𝒑 = 𝟎. 𝟖𝟐𝟕𝟐

• We find that the adjusted coefficient of multiple determination


has decreased after the addition of an insignificant variable.
Multicollinearity
Multicollinearity occurs when similar information is
provided by two or more of the predictor variables
in a multiple regression. It can occur in a number
of ways:

1. Two predictors are highly correlated with each other (that is, they
have a correlation coefficient close to +1 or -1). In this case, knowing
the value of one of the variables tells you a lot about the value of the
other variable.

2. A linear combination of predictors is highly correlated with another


linear combination of predictors. In this case, knowing the value of the
first group of predictors tells you a lot about the value of the second
group of predictors. Hence, they are providing similar information.
Measuring Multicollinearity
• The metric commonly used to measure multicollinearity is
the Variance Inflation Factor (VIF).

• VIF is a measure of the degree to which the square of the


standard error of a regression coefficient has been
increased due to multicollinearity.

• For example, suppose the VIF corresponding to a


regression coefficient is 4. This implies that the square of
the standard error of that coefficient has been increased by
a factor of 4, or the standard error of that coefficient has
increased by a factor of 4 = 2, due to the effects of
multicollinearity.

• This might have led to incorrect inferences of the associated


variable’s significance in the regression model.
Variance Inflation Factor
• Considering the following MLR model (with n predictors):

𝑌෠ = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑛 𝑋𝑛

• The VIF for each predictor 𝑘 is given by:

1
𝑉𝐼𝐹𝑘 =
1 − 𝑅𝑘2
• Where 𝑅𝑘2 is the coefficient of multiple determination
considering 𝑋𝑘 as the dependent variable and all other 𝑋𝑖
(𝑖 ∈ {1,2,3, … , 𝑛}\{𝑘}) as independent variables

• The amount of 𝑉𝐼𝐹𝑘 that represents an acceptable level of


multicollinearity is highly problem-specific and difficult to
generalize.
Variance Inflation Factor
• Consider the same mileage example from before. The
regression equation is:

෣ = 𝟖𝟕. 𝟕𝟑𝟗 − 𝟎. 𝟒𝟏𝟐 ∗ 𝑷𝒐𝒘𝒆𝒓 − 𝟏. 𝟔𝟐𝟎 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑴𝒊𝒍𝒆𝒂𝒈𝒆

• Since there are only two significant predictors in this case,


their respective 𝑉𝐼𝐹s will be one and the same.

• Treating 𝑃𝑜𝑤𝑒𝑟 as the dependent variable and 𝐴𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑖𝑜𝑛


as the independent variable, regression is performed and
the best fit model is found to be:

෣ = 𝟏𝟐𝟑. 𝟐𝟗 − 𝟑. 𝟏𝟗𝟐 ∗ 𝑨𝒄𝒄𝒆𝒍𝒆𝒓𝒂𝒕𝒊𝒐𝒏


𝑷𝒐𝒘𝒆𝒓
Variance Inflation Factor
• The coefficient of determination for this regression is found
to be:

2
𝑅𝑃𝑜𝑤𝑒𝑟 = 0.0638

• The VIF is given by:

1 1 1
𝑉𝐼𝐹𝑃𝑜𝑤𝑒𝑟 = 2 = = = 𝟏. 𝟎𝟔𝟖𝟐
1 − 𝑅𝑃𝑜𝑤𝑒𝑟 1 − 0.0638 0.9362

• Here, the VIF is only marginally greater than 1, implying that


there is very little correlation between the predictors, and
hence, the effects of multicollinearity are not significant in
this model.
Stationarity in time-series

• Stationary time-series is one whose properties do not


depend on the time at which the series is observed.

• It does not matter when you observe it, it should look


much the same at any period of time.

• So time-series with trends, or with seasonality, are not


stationary – the trend and seasonality will affect the
value of the time-series at different times.

• In general, a stationary time-series will have no


predictable patterns in the long-term.
Stationarity in time-series

• In general, a series of observations over time {𝑦𝑡 } is


stationary if the following conditions hold:

𝜇𝑡 = 𝜇; ∀𝑡
𝜎𝑡 = 𝜎; ∀𝑡
𝜌 𝑦𝑡 , 𝑦𝑡+ℎ = 𝜌ℎ

• In other words, for a series to be stationary, the mean


and variance of the time-series must not change over
time and the correlation coefficient of any pair of
observations that are ℎ periods apart must be the same,
i.e., it should be independent of their actual positions in
the time-series.
Which of these are stationary ?
• Obvious seasonality rules out series (d), (h) and (i).

• Trend rules out series (a), (c), (e), (f) and (i).

• Increasing variance also rules out (f).

• That leaves only (b) and (g) as stationary series.


Differencing
• Computing the differences between consecutive
observations is known as differencing.

• It is one way of converting a non-stationary time-


series into a stationary time-series.

• If 𝑦𝑡 and 𝑦𝑡−1 are the observations of the


response variable in periods 𝑡 and 𝑡 − 1
respectively, then the differenced observation in
period 𝑡 is given by,

𝑦𝑡′ = 𝑦𝑡 − 𝑦𝑡−1
Differencing
• Transformations such as logarithms can help to
stabilize the variance of a time-series.

• Differencing can help stabilize the mean of a


time-series by removing changes in the level of
a time-series.

• It can also help in eliminating (or reducing) the


effects of trend and seasonality.
Second-Order Differencing
• In many cases, differencing only once may not be
sufficient to make a non-stationary time-series
stationary.

• However, in some of those cases, the differences of


those differences may be stationary. Such differencing
is called second-order differencing.


𝑦𝑡′′ = 𝑦𝑡′ − 𝑦𝑡−1 = 𝑦𝑡 − 𝑦𝑡−1 − (𝑦𝑡−1 − 𝑦𝑡−2 )

𝑦𝑡′′ = 𝑦𝑡 − 2𝑦𝑡−1 + 𝑦𝑡−2


Auto Regressive Models/Process

• In a multiple regression model, we forecast the


variable of interest using a linear combination of
predictors.

• In an autoregression model, we forecast the


variable of interest using a linear combination of
past values of the variable.

• The term autoregression indicates that it is a


regression of the variable against itself.
• Thus an autoregressive model of order p can be written as

• Where, 𝜀𝑡 is white noise.

• This is like a multiple regression, but with lagged values of


𝑦𝑡 as predictors. We refer to this as an AR(p) model.

• Autoregressive models are remarkably flexible at handling


a wide range of different time-series patterns.

• The variance of the error term 𝜀𝑡 will only change the


scale of the series, not the patterns.
AR(1) Process
• Consider the following AR(1) process:

𝐴𝑅 1 ∶ (𝑋𝑡 − 𝜇) = 𝜑1 (𝑋𝑡−1 − 𝜇) + 𝜀𝑡

where, 𝑋𝑡 is the observed response in period 𝑡,


𝜇 is the mean of the observed responses,
𝜑1 is the coefficient of the lagged response,
𝜀𝑡 is the residual in period 𝑡

• For simplicity, let the mean-adjusted observations be


represented by 𝑋ሶ 𝑡 = 𝑋𝑡 − 𝜇

• The AR(1) process can be written as:


𝑋ሶ 𝑡 = 𝜑1 𝑋ሶ 𝑡−1 + 𝜀𝑡
AR(1) Process
• Let 𝑋෠𝑡 be the predicted response in period 𝑡. The regression
equation can be written as:

𝑋෠𝑡 = 𝑋𝑡 − 𝜀𝑡 = (𝜇 + 𝑋ሶ 𝑡 ) − 𝜀𝑡 = (𝜇 + 𝜑1 𝑋ሶ 𝑡−1 + 𝜀𝑡 ) − 𝜀𝑡

෡ 𝒕 = 𝝁 + 𝝋𝟏 𝑿ሶ 𝒕−𝟏
𝑿

• An AR(1) process aims to predict the response using a


single lagged value as a predictor and hence, it is analogous
to simple linear regression.

• The OLS estimate of 𝜑1 is given by,

𝐶𝑜𝑣(𝑋ሶ 𝑡 , 𝑋ሶ 𝑡−1 )
𝜑1 =
𝑉𝑎𝑟(𝑋ሶ 𝑡−1 )
Example
• Consider the following demand for TV sets over a 12-month period.
Using AR(1) process, predict the demand for the 13th month. Also,
compare the AR(1) model with MA(1) model and a five-period simple
moving average model using any error metric.

Period Demand Xt
1 106
2 110
3 118
4 105
5 115
6 100
7 112
8 106
9 118
10 102
11 112
12 110
AR(1) Process
Period Demand Xt 𝑿ሶ 𝒕 𝑿ሶ 𝒕−𝟏 𝑿ሶ 𝒕 𝑿ሶ 𝒕−𝟏 (𝑿ሶ 𝒕 )2
1 106 -3.5 12.25
2 110 0.5 -3.5 -1.75 0.25
3 118 8.5 0.5 4.25 72.25
4 105 -4.5 8.5 -38.25 20.25
5 115 5.5 -4.5 -24.75 30.25
6 100 -9.5 5.5 -52.25 90.25
7 112 2.5 -9.5 -23.75 6.25
8 106 -3.5 2.5 -8.75 12.25
9 118 8.5 -3.5 -29.75 72.25
10 102 -7.5 8.5 -63.75 56.25
11 112 2.5 -7.5 -18.75 6.25
12 110 0.5 2.5 1.25 0.25
Sum 1314 -256.25 379
Average 109.5 -23.3 31.6
AR(1) Process
• The data from the previous table can be used to
estimate the value of 𝜑1 for the AR Process.

𝐶𝑜𝑣(𝑋ሶ 𝑡 , 𝑋ሶ 𝑡−1 )
𝜑1 =
𝑉𝑎𝑟(𝑋ሶ 𝑡−1 )

• Since every 𝑋ሶ 𝑡−1 is nothing but the lagged value of the


corresponding 𝑋ሶ 𝑡 , their variances are equal. Hence,
𝑉𝑎𝑟(𝑋ሶ 𝑡−1 ) can be replaced by 𝑉𝑎𝑟(𝑋ሶ 𝑡 )

𝐶𝑜𝑣(𝑋ሶ 𝑡 , 𝑋ሶ 𝑡−1 ) 𝐸(𝑋ሶ 𝑡 𝑋ሶ 𝑡−1 ) −23.3


𝜑1 = = = = −𝟎. 𝟕
𝑉𝑎𝑟(𝑋𝑡 ) ሶ 𝐸(𝑋𝑡 )ሶ 2 31.6
AR(1) Process
• The regression model can then be written as:

𝑋෠𝑡 = 𝜇 + 𝜑1 𝑋ሶ 𝑡−1

𝑋෠𝑡 = 109.5 − 0.7𝑋ሶ 𝑡−1

• For period 𝑡 = 13, the forecasted value is:

𝑋෠13 = 109.5 − 0.7𝑋ሶ12

𝑋෠13 = 109.5 − 0.7 ∗ 0.5

෡ 𝟏𝟑 = 𝟏𝟎𝟗. 𝟐
𝑿
AR(1) Process
• Residual sum of squares from forecasts using AR(1) model:
෡ 𝒕 Error 𝜺𝒕
Period Demand Xt Forecast 𝑿 𝜺𝟐𝒕
1 106
2 110 112 -2 4
3 118 109.2 8.8 77.44
4 105 103.6 1.4 1.96
5 115 112.7 2.3 5.29
6 100 105.7 -5.7 32.49
7 112 116.2 -4.2 17.64
8 106 107.8 -1.8 3.24
9 118 112.2 5.8 33.64
10 102 103.6 -1.6 2.56
11 112 115 -3 9
12 110 109.2 0.8 0.64
Sum 187.9
Average 17.08
Simple 5-Period Moving Average(MA) Model

Period Demand Xt ෡𝒕
Forecast 𝑿 Error 𝜺𝒕 𝜺𝟐𝒕
1 106
2 110
3 118
4 105
5 115
6 100 110.8 -10.8 116.6
7 112 109.6 2.4 5.76
8 106 110 -4 16
9 118 107.6 10.4 108.2
10 102 110.2 -8.2 67.24
11 112 107.6 4.4 19.36
12 110 110 0 0
Sum 1314 333.2
Average 109.5 47.6
Moving Average Model
• A Moving Average (MA) model is similar to an AR model,
but uses lagged residuals instead of lagged observations
to predict the response.

• An MA model of order 𝑞 is expressed as:

𝑀𝐴 𝑞 : 𝑋𝑡 − 𝜇 = 𝜀𝑡 − 𝜃1 𝜀𝑡−1 − 𝜃2 𝜀𝑡−2 − ⋯ − 𝜃𝑞 𝜀𝑡−𝑞

• Here, the sign convention (negative coefficients for lagged


errors) followed by Box and Jenkins is used.

• The most simple type of MA model (MA(1)) is given by:

𝑀𝐴 1 : 𝑋𝑡 − 𝜇 = 𝜀𝑡 − 𝜃1 𝜀𝑡−1
Moving Average Model

• The corresponding regression equation is:

෡ 𝒕 = 𝝁 − 𝜽𝟏 𝜺𝒕−𝟏
𝑿

• Estimating MA parameters is more difficult than estimating


AR parameters, as the lagged error terms are not
observable.

• Iterative non-linear fitting procedures need to be used


instead of OLS for estimating the MA parameters.
Moving Average Model
• For the same example, an MA(1) model with 𝜃1 = 0.5 is fit.
• Here 𝜇 = 109.5, therefore 𝑋෠𝑡 = 109.5 − 0.5 ∗ 𝜀𝑡−1

Period ෡𝒕
Demand Xt Forecast 𝑿 Error 𝜺𝒕 𝜺𝟐𝒕
1 106
2 110 111.3 -1.3 1.7
3 118 110.2 7.8 60.8
4 105 105.6 -0.6 0.4
5 115 109.8 5.2 27.0
6 100 106.9 -6.9 47.6
7 112 113 -1 1
8 106 110 -4 16
9 118 111.5 6.5 42.3
10 102 106.3 -4.3 18.5
11 112 111.7 0.3 0.1
12 110 109.4 0.6 0.4
Sum 1314 215.7
Average 109.5 19.6
ARMA(𝑝, 𝑞) Models
• An ARMA model combines features of AR models and
MA models.

• It is defined by two parameters 𝑝 and 𝑞, where 𝑝 is the


order of the AR part of the model and 𝑞 is the order of
the MA part of the model.

• In other words, an ARMA(𝑝, 𝑞) model will use 𝑝 lagged


observations and 𝑞 lagged residuals to predict the
response.

• A little consideration will show that an AR(𝑝) process


and an MA(𝑞) process are special cases of an
ARMA(𝑝, 𝑞) process obtained by setting 𝑞 = 0 and 𝑝 =
0, respectively.
ARMA (𝑝, 𝑞) Models

➢ ARMA(1, 1):
(Xt -  ) = 1 (X(t-1) -  ) +  t +  1 (t-1)

➢ ARMA(2, 1):
(Xt -  ) = 1 (X(t-1) -  ) + 2 (X(t-2) -  ) +  t +  1 (t-1)

➢ ARMA(1, 2):
(Xt -  ) = 1 (X(t-1) -  ) +  t +  1 (t-1) +  2 (t-2)
ARIMA(𝒑, 𝒅, 𝒒) Models
• ARIMA(𝑝, 𝑑, 𝑞) models are extensions of ARMA(𝑝, 𝑞)
models that incorporate differencing, thereby allowing
non-stationary time-series to be modelled.

• The ‘I’ in ARIMA stands for ‘Integrated’, which represents


the use of differencing of observations to make the time-
series stationary.

• In addition to using 𝑝 lagged observations and 𝑞 lagged


residuals, an ARIMA model includes another parameter 𝑑,
which denotes the degree of differencing (the number of
times the raw observations have been differenced).
ARIMA(𝒑, 𝒅, 𝒒) Models
• These models have become very popular forecasting
tools over recent years because of their high versatility,
flexibility and ability to handle complex time-series that
cannot be easily modelled by other approaches.

• Virtually any other forecasting model can be described as


a special case of an ARIMA(𝑝, 𝑑, 𝑞) model.

• Sophisticated ARIMA implementations are included with


most modern day statistical software packages.
Box-Jenkins Modelling

➢ The Box-Jenkins method is a systematic approach to


modelling ARIMA processes.

➢ It was introduced by statisticians George Box and


Gwilym Jenkins in 1970 and greatly popularized
through a book that they authored.

➢ It involves identifying a suitable ARIMA model for a


time-series data by analysing its features, fitting that
model to the data and using it to make forecasts.
Box-Jenkins Modelling

➢ The Box-Jenkins approach can be summarized in 5


steps:

1. Data preparation
2. Model selection
3. Parameter estimation
4. Model Checking
5. Forecasting
Box-Jenkins Modelling
➢ Data Preparation:
➢ This involves cleaning the data and performing necessary
transformations and differencing to make the time series
stationary, to stabilize the variance of the data and minimize
the effects of obvious patterns like trend and seasonality.
The order of differencing (𝑑) is generally decided at this
stage.

➢ Model Selection:
➢ The model selection may be done by using plots of
Autocorrelation function (ACF), Partial Autocorrelation
function (PACF) and Extended Autocorrelation function
(EACF). These plots assist in deciding how many lagged
observations (𝑝) and residuals (𝑞) are needed.
Box-Jenkins Modelling
• Estimation:

• Once a model has been identified (i.e., the values of 𝑝


and 𝑞), we need to estimate the parameters 𝜑𝑖 (𝑖 =
1,2,3, … , 𝑝) and 𝜃𝑖 (𝑖 = 1,2,3, … , 𝑞).

• The maximum likelihood estimation (MLE) technique finds


the values of the parameters which maximize the
probability of obtaining the data that we have observed.

• For ARIMA models, MLE is very similar to the least


squares estimates that would be obtained by minimizing
Box-Jenkins Modelling
➢ Model Checking:
➢ This step involves validating a developed model both in
terms of its accuracy and its interpretability. Usually,
multiple competing models are developed and the best
of those according to the desired criteria (AIC, BIC,
RMSE, etc.) is retained for forecasting.

➢ Forecasting:
➢ Once the model has been deemed fit for purpose, the
final step is to use the model to make forecasts.
Feedback from the performance of the model may be
used to further to make minor modifications to it and
refine it.
ARIMA Example
• Consider the same paper production example from before:
Period Paper_Prod Period Paper_Prod
Jan-17 2740 Jan-19 3870
Feb-17 2805 Feb-19 3850
Mar-17 2835 Mar-19 3810
Apr-17 2840 Apr-19 3800
May-17 2895 May-19 3790
Jun-17 2905 Jun-19 3820
Jul-17 2990 Jul-19 3910
Aug-17 3070 Aug-19 3980
Sep-17 3185 Sep-19 4030
Oct-17 3275 Oct-19 4110
Nov-17 3320 Nov-19 4195
Dec-17 3305 Dec-19 4235
Jan-18 3285 Jan-20 4325
Feb-18 3255 Feb-20 4395
Mar-18 3235 Mar-20 4475
Apr-18 3225 Apr-20 4510
May-18 3260 May-20 4495
Jun-18 3345 Jun-20 4470
Jul-18 3405 Jul-20 4450
Aug-18 3595 Aug-20 4435
Sep-18 3725 Sep-20 4425
Oct-18 3790 Oct-20 4485
Nov-18 3850 Nov-20 4585
Dec-18 3875 Dec-20 4635
Solution
• From a plot of the production data, it can be clearly seen
that the data exhibits cyclicity as well as an increasing
trend.
4700

4500

4300

4100
Paper Production

3900

3700

3500

3300

3100

2900

2700
0 5 10 15 20 25 30 35 40 45 50
Period
Solution
➢ Data Preparation:
➢ An initial analysis of the data reveals no extreme outliers or
sudden changes of magnitude or missing observations.

➢ The data however contains trend and cyclicity, making it


non-stationary, and hence, differencing is probably needed.

➢ Normally, the correct amount of differencing is the lowest


order (𝑑) of differencing that yields a time series which
fluctuates around a well-defined mean value, and exhibits
no long-term trends.

➢ We initially set 𝑑 = 1 and plot the resulting time-series.


Solution
• The data after first-order differencing, which makes it
approximately stationary. Hence, we move on to step 2.
Solution
➢ Model Selection:
➢ In this step, the number of AR terms (𝑝) and MA terms (𝑞) in
to be used in the ARIMA model are decided.

➢ For this purpose, we plot the Auto-Correlation Function


(ACF) and the Partial Auto-Correlation Function (PACF) for
our stationarized series.

➢ Usually, we may not need both AR and MA terms. It may be


sufficient to just have one of the two.

➢ As a rule of thumb, if the lag-1 correlation (first bar in the


ACF plot) is positive, AR terms work best, and if the lag-1
correlation is negative, MA terms work best
Solution
• The ACF plot for the stationarized data:
Solution
• The PACF plot for the stationarized data:
Neural Network Models in Forecasting

➢ Neural Networks (NNs) are mathematical models


inspired by the functioning of biological neurons.

➢ NNs are built from simple processing elements


(PEs) called nodes, units, or neurons, and contain
certain adjustable parameters ("weights").

➢ NNs use rules to modify these parameters upon


the presentation of data, specifically to "learn from
the environment”.
Neural Network Models in Forecasting
➢ Given sufficient data, NNs are well suited to the
task of forecasting.

➢ NNs excel at pattern recognition and forecasting


from pattern clusters.

➢ They allow modelling of complex nonlinear


relationships between the response variable and
its predictors.
Architecture of NNs
➢ The general architecture of NNs consists of an
interconnected system of neurons arranged in a
multi-layered structure, (i.e.) an input layer, an
output layer and one or more optional hidden
layers.

➢ The simplest networks contain no hidden layers


and are equivalent to linear regressions.

➢ The number of inputs generally correspond to the


number of predictor variables in the model.
A Simple NN
An NN with one Hidden Layer
Structure of a Neuron
Working of Simple NNs
• Each input is initially assigned a weight.

• A linear combination of these weighted inputs along with a


bias term is fed into the activation function (sigmoid, logistic,
arctan, etc.,), which then transforms it into a scaled output
between 0 and 1.

• The deviation of this output from its target value is then used
to adjust the weights.

• By using a sufficiently large number of observations, the


weights (regression coefficients) corresponding to each input
may be optimally tuned.
Thank You !!!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy