Multiple Linear Regression
Multiple Linear Regression
• A model with more variables can have higher 𝑅𝑆𝐸 if the decrease in 𝑅𝑆𝑆 is small relative to the
increase in the number of variables (𝑝).
Model Fit for Advertising Data Set
Model Predictors 𝑅2 Adjusted 𝑅𝑆𝐸
− 𝑅2
1 TV 0.61 0.61 3.26
2 Radio 0.33 0.33 4.28
3 Newspaper 0.05 0.05 5.09
4 TV & Radio 0.90 0.90 1.68
5 TV & 0.65 0.64 3.12
Newspaper
6 Radio & 0.33 0.33 4.28
Newspaper
7 TV, Radio & 0.90 0.90 1.69
Newspaer
Two Types of Predictions
• We will develop two types of interval estimates regarding 𝑦:
1. A confidence interval for the expected value of 𝑦.
2. A prediction interval for an individual value of 𝑦.
• It is common to refer to the first as a confidence interval and the
second as a prediction interval.
Confidence Interval vs Prediction Interval
• In order to assess the uncertainty associated with the predicted response,
consider the following two cases:
➢ How should we quantify the uncertainty associated with the average sales
over a number of markets, given that $100,000 is spent on TV advertising,
$20,000 is spent on Radio advertising, and $10,000 is spent on Newspaper
advertising in each market?
➢How should we quantify the uncertainty associated with the sales of a
particular market, given that $100,000 is spent on TV advertising, $20,000 is
spent on Radio advertising, and $10,000 is spent on Newspaper advertising in
that market?
The Confidence Interval
• The point estimate of 𝐸(𝑦 0 ) is just the 𝑦ො 0 value:
𝑦ො 0 = 𝑏0 + 𝑏1 𝑥10 + 𝑏2 𝑥20 + ⋯ + 𝑏𝑝 𝑥𝑝0 .
• For specific values of 𝑥1 , … , 𝑥𝑘 , denoted by 𝑥10 , 𝑥20 , … , 𝑥𝑝0 , the 100 1 − 𝛼 %
confidence interval for the expected value of 𝑦 is computed as
𝑦ො 0 ± 𝑡𝛼Τ2,𝑑𝑓 se 𝑦ො 0 ,
where 𝑑𝑓 = 𝑛 − 𝑝 − 1.
Computation of Confidence Interval in Excel
• Many statistical programs provide confidence intervals directly. Excel
does not provide it directly.
• However one can still compute it using a simple trick.
• To derive together with se 𝑦ො 0 , we first estimate a modified regression
model where 𝑦 is the response variable and the explanatory variables are
defined as 𝑥1∗ = 𝑥1 − 𝑥10 , 𝑥2∗ = 𝑥2 − 𝑥20 , … , 𝑥𝑘∗ = 𝑥𝑘 − 𝑥𝑘0 .
• The resulting estimate of the intercept and its standard error equal 𝑦ො 0
and se 𝑦ො 0 , respectively.`
Advertising Data Set
For the Advertising Data, we first shift the values of the predictors by our
hypothesized value:
Obs. Sales TV Radio Newspaper 𝐓𝐕 − 𝟏𝟎𝟎 𝑹𝒂𝒅𝒊𝒐 − 𝑵𝒆𝒘𝒔𝒑𝒂𝒑𝒆𝒓
20 − 𝟏𝟎
1 22.1 230.1 37.8 69.2 130.1 17.8 59.2
2 10.4 44.5 39.3 45.1 −55.5 19.3 35.1
3 9.3 17.2 45.9 69.3 −82.8 25.9 59.3
Advertising Data Set
• Estimating the modified regression now reveals the confidence interval:
Coefficients Std. t-statistic p-value Lower Upper
error 95% 95%
Intercept 11.276 0.175 64.348 0.000 10.930 11.621