Chap 10
Chap 10
Slide 1/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Outline
Introduction
Separate Means model c.f. Regression
Regression model
Regression Model
Parameter Estimates
Model Assumptions
Inference
ANOVA and Regression
Assessing the model
Correlation (r )
Influential Observations
Inference for the Response
Interval Estimates
A Full Example
Summary
Slide 2/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Reference
Slide 3/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Learning Outcomes:
At the end of this topic you should be able to:
I Perform a regression to compare models with a numerical
explanatory variable.
I State and check the assumptions required for regression.
I Calculate confidence intervals and hypothesis tests for
parameters in a regression model.
I Identify influential observations and explain how they impact
regression analysis.
I Interpret regression models including parameters, confidence
and prediction intervals .
I Write summaries and conclusions in the style of academic
research papers, based on confidence intervals and hypothesis
tests.
Slide 4/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
The data on car weight was later re-coded into three groups:
I “light” — cars that weighed less than 1200kg
I “medium” — cars that weighed between 1200 and less than
1500kgm
I “heavy” — cars that weighed more than 1500kgm
Response variable: Fuel consumption (numeric)
Explanatory variable: Weight-code (categorical)
Slide 5/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
d
yij = µi + eij eij = N(0, )
yij — fuel consumption of the jth car in weight code i
µi — mean fuel consumption for cars under weight code i
eij random error for car j in weight code i
Parameter estimates:
I µ̂light = 7.8 L/100km
I µ̂medium = 9.7 L/100km
I µ̂heavy = 11.9 L/100km
I ˆ = sresid = spooled = 0.69 L/100km.
Slide 6/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model — graphically
Slide 7/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
ANOVA output
Slide 8/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
ANOVA output
Slide 9/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
. . . further analysis
Slide 10/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Regression model
Alternative approach
What if we used the actual weights of the cars. . . rather than just
the weight codes?
I Response variable: Fuel consumption (numeric — as before)
I Explanatory variable: Car Weight (numeric — not
categorical!)
That changes the kinds of questions that we’re interested in, and
also increases the breadth of potential analyses.
Slide 11/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Regression model
Slide 12/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Regression model
Slide 13/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Regression model
y i = ↵ + xi + e i i = 1, . . . , n
And it is assumed that the errors (ei ) are a random sample from a
N(0, ) distribution.
Slide 14/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 15/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
d
ei = N(0, )
Slide 16/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Furthermore:
We expect 95% of the observations to be within 2 of the line.
Slide 17/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
d
y i = ↵ + xi + e i ei = N(0, )
We estimate the equation using our sample of data.
ˆ + ˆx
E (Ŷ |x) = ↵
Slide 18/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 19/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 20/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
The Model
y i = ↵ + xi + e i i = 1, . . . , n
Assumptions:
d
I ei = N(0, )
I ei are independent
I Right form of the equation. . . usually, if the assumptions
about the errors are satisfied =) right equation.
Slide 21/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Parameter Estimates
Slide 22/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Parameter Estimates
Estimates of ↵, ,
Pn
ˆ = i=1 (xi x̄)(yi ȳ ) sY
Pn =r
i=1 (xi x̄)2 sX
ˆ = ȳ ˆx̄
↵
v
uP ⇣ ⇣ ⌘⌘2 sP
u n ˆxi
t i=1 yi ˆ
↵ (resid)2
ˆ = =
n 2 n k
sP
(yobserved ŷ )2
=
n 2
Slide 23/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Parameter Estimates
. . . from Minitab
Slide 24/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Parameter Estimates
Slide 25/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Parameter Estimates
Slide 26/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Assumptions of model
The model assumes that the variation in Fuel Consumptions, within
each level of car weight, is described by a normal distribution with
the same standard deviation for each level of car weight.
We check that:
I residuals have come from a normally distributed population of
residuals (normality)
I there is the same variability in Fuel Consumptions for each
level of car weight (constant variance)
I residuals represent random draws from a population
(independence of observations)
Slide 27/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
. . . assumptions re-stated
The model
d
yi = ↵ + xi + e i ei = N(0, )
E (Y |x) = ↵ + x
=) for a particular (fixed) value of the explanatory variable:
E (Y |x) = ↵ + x y -value we ‘expect’ is given by
the equation of the line
ˆ + ˆx
ŷx = ↵ y -value we ‘predict’ is given by
estimated equation of the line
d
Y |x = N(↵ + x, )
That is, for a given value of x, the observations will be:
I ‘centered’ on the line; and
I normally distributed about the line.
Slide 28/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
. . . visually
Slide 29/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 30/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 31/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 32/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 33/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 34/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Slide 35/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Model Assumptions
Checking Independence
Assume that cars were randomly selected which would mean that
observations can be treated as independent.
Slide 36/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Minitab Output
Slide 37/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Minitab Output
Slide 38/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
a=↵
ˆ= se(A) =
b= ˆ= se(B) =
ˆ=
Slide 39/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A 95% CI for ↵, the fuel consumption of a car with zero weight is:
Slide 40/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
. . . degrees of freedom
Slide 41/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 42/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
H0 : =0
H1 : 6= 0
ˆ 0 0.0080238
test statistic = = = 20.73
se(B) 0.0003870
Slide 42/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
We can also test specific values for ↵ and using the same
approach we have used earlier:
For example:
H0 : = 0.009 = 0
H1 : 6= 0.009
ˆ 0.0080238 0.009
0
test statistic = = = 2.584
se(B) 0.0003870
d
= t19 if H0 is true
Slide 43/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
That is:
d
H0 : yi = µfuel + ei ei = N(0, ) Model (0)
d
H 1 : y i = ↵ + xi + e i ei = N(0, ) Model (1)
Models compared
Slide 45/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 46/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Source df SS MS F P-value
Regression
Error
Total
Slide 47/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
In general
The general form of the ANOVA table for a regression analysis is:
sum of
source of degrees of squared mean square compared with could it be
variation freedom deviations residual MS by chance?
Source df SS MS F P
Regression k 1 SSreg calculated as MSreg /MSres P-value
Residual n k SSres MS = SS/df
Total n 1 SStotal don’t add
Slide 48/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Fuel consumption
Slide 49/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 50/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 51/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 52/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Correlation (r )
Strength of relationship — r
Slide 53/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Correlation (r )
About r
Slide 54/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Correlation (r )
Inference using r
H0 : ⇢ = 0
is equivalent to testing
H0 : =0
Slide 55/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Correlation (r )
Slide 56/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Correlation (r )
Correlation (r )
Influential Observations
Influential Observations
Potentially influential points stand out because they have either a
large standardized residual (extreme in the y -direction) or their
x-value lies a long way from the mean of the xs (extreme in the
x-direction)
I If any observation has a standardized residual with modulus
2, then Minitab lists it as an unusual observation
denoted by R.
I standardized residuals are residuals that have been adjusted so
that for any given value of x, residuals have a standard
deviation close to 1.
I Observations that are extreme in the x-direction may exert a
lot of influence over the model. . . they have potentially high
leverage. Points with unusual values for leverage are also
listed by Minitab (X).
Slide 59/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Leverage
Slide 60/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Leverage
Slide 61/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Slide 62/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
. . . point deleted
Slide 63/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
. . . point included
Slide 64/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
. . . point included
Slide 65/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Caution!
Slide 66/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Slide 67/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
ANOVA Output
Slide 68/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
ANOVA Output
Slide 69/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Influential Observations
Slide 70/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
Slide 71/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
Slide 72/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
Prediction Interval
Slide 73/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
Prediction Interval
µX ± Z (0.975) ⇥ X
r
2
2 X
µX unknown? µ̂X ± Z (0.975) ⇥ X +
r n
s 2
X
X unknown? µ̂X ± tdferror (0.975) ⇥ sX 2 +
n
r
s 2
X
se(prediction) = sX 2 +
n
Slide 74/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
Prediction Interval
v
u
u how individuals sample-to-sample
u
µ̂C |time ±dist. value⇥t vary about + variability in
the mean estimating the mean
Slide 75/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Interval Estimates
mast 0010
Confidence Intervals
Dom
And what if you wanted to predict, with 95% confidence, a range
for the expected (or mean) calorie intake (for children who sit at
the table for 33 minutes)
sC
95%CI (µC ) = µ̂C ± tn 1⇥ p
n
Slide 76/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Does how long toddlers sit at the lunch table help predict how
many calories they consume?
Time (x) 21.4 30.8 37.7 33.5 32.8 39.5 22.8 34.1 33.9 43.8
Calories (y) 472 498 465 456 423 437 508 431 479 454
Time (x) 42.4 43.1 29.2 31.3 28.6 32.9 30.6 35.1 33.0 43.7
Calories (y) 450 410 504 437 489 436 480 439 444 408
Slide 77/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
The Model
Stat ! Regression ! Regression ! Fit Regression Model. . .
Slide 78/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
toddlers table
A Full Example
sitting
Interpreting the Model coloring
561.2 30095 time
minute spent
Slope:
For
every 1
table toddlers consume
IItTge
Intercept:
no.me
A Full Example
Slide 80/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Graphically
5612
3.0951
Slide 81/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Assessing Assumptions
0.047
p
Slide 82/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Assessing Assumptions
Slide 83/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Inference
I Give the meaning of the slope of the regression line and find a
95% confidence interval for the slope.
I What proportion of the variation in calories is explained by the
relationship between Calories and Time?
R
I Find the correlation between Calories and Time.
F
I What can you conclude about the relationship between Time
and Calories?
I Is this conclusion consistent with the results given in the
ANOVA Table?
Slide 84/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Making Predictions
Predict the expected calorie consumption for toddlers who sit at
the table for 33 minutes.
(95%) confidence interval for E (Y |x)
A Full Example
To
We are 95% confident that toddlers who sit at the table for 33
minutes will consume, on average, somewhere between 448 and
470 calories.
Slide 86/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
A Full Example
Predict the calorie consumption for a toddler who sits at the table
for 33 minutes.
Slide 87/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
0
1
Slide 88/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Approach 1: Regression
Me
Slide 90/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
7
Slide 91/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 92/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Regression models
Slide 93/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Not good
Slide 94/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Slide 95/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
Random
Sample
Check the assumptions of the model
Indep obs
plte Normally
errors
distributed
w
I const equal var
1B
I Coefficient of determination — R 2
Slide 98/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary
tdferror
se fit
I Prediction Intervals for individual observation (Y |x0 )
tatemer it id