0% found this document useful (0 votes)
11 views100 pages

Chap 10

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views100 pages

Chap 10

Uploaded by

devadityasen2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Chapter 10: Linear Regression

MAST10010 Data Analysis

Department of Mathematics & Statistics


University of Melbourne i

September 23, 2024

Slide 1/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Outline
Introduction
Separate Means model c.f. Regression
Regression model
Regression Model
Parameter Estimates
Model Assumptions
Inference
ANOVA and Regression
Assessing the model
Correlation (r )
Influential Observations
Inference for the Response
Interval Estimates
A Full Example
Summary
Slide 2/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Reference

I Utts & Heckard 4th & 5th Ed. Chapters 3 & 14


I DeVeaux & Velleman & Bock 3rd Ed. Chapters 7 & 8

Slide 3/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Learning Outcomes:
At the end of this topic you should be able to:
I Perform a regression to compare models with a numerical
explanatory variable.
I State and check the assumptions required for regression.
I Calculate confidence intervals and hypothesis tests for
parameters in a regression model.
I Identify influential observations and explain how they impact
regression analysis.
I Interpret regression models including parameters, confidence
and prediction intervals .
I Write summaries and conclusions in the style of academic
research papers, based on confidence intervals and hypothesis
tests.
Slide 4/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

Fuel consumption of cars

A study, conducted by the RACV, recorded the fuel consumption


(L/100 km) and weight for several di↵erent car makes and models.

The data on car weight was later re-coded into three groups:
I “light” — cars that weighed less than 1200kg
I “medium” — cars that weighed between 1200 and less than
1500kgm
I “heavy” — cars that weighed more than 1500kgm
Response variable: Fuel consumption (numeric)
Explanatory variable: Weight-code (categorical)

“How does car weight a↵ect Fuel consumption?”

Slide 5/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

Separate Means model

d
yij = µi + eij eij = N(0, )
yij — fuel consumption of the jth car in weight code i
µi — mean fuel consumption for cars under weight code i
eij random error for car j in weight code i

Parameter estimates:
I µ̂light = 7.8 L/100km
I µ̂medium = 9.7 L/100km
I µ̂heavy = 11.9 L/100km
I ˆ = sresid = spooled = 0.69 L/100km.

Slide 6/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

Model — graphically

Slide 7/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

ANOVA output

Slide 8/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

ANOVA output

Slide 9/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Separate Means model c.f. Regression

. . . further analysis

Slide 10/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression model

Alternative approach

What if we used the actual weights of the cars. . . rather than just
the weight codes?
I Response variable: Fuel consumption (numeric — as before)
I Explanatory variable: Car Weight (numeric — not
categorical!)
That changes the kinds of questions that we’re interested in, and
also increases the breadth of potential analyses.

Slide 11/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression model

Form of the Model

In our previous analysis of the RACV data we fitted a separate


means model and compared this with a model with no
explanatory variables (a single mean model).

In this approach, we fit a separate mean to every possible value of


the explanatory variable, and we also fit a relationship between
these means.

In regression analysis we investigate the relationship between two


(or more) numeric (usually continuous) variables.

Slide 12/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression model

Separate Means c.f. Regression model

Slide 13/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression model

Simple linear regression model — a model for one numeric


response variable and one numerical explanatory.

y i = ↵ + xi + e i i = 1, . . . , n

And it is assumed that the errors (ei ) are a random sample from a
N(0, ) distribution.

Model consists of:


I equation; and
I statement of assumed distribution of the errors.

Slide 14/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Some Terminology (reminder)

Slide 15/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

About this model

d
ei = N(0, )

For a given value of x, observations are distributed normally about


the line. . . and have same .

Slide 16/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

About this model

=) For a given value of x, observations are distributed normally


about the line. . . and have same variance for all values of X .

Furthermore:
We expect 95% of the observations to be within 2 of the line.

Slide 17/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Estimating the model

d
y i = ↵ + xi + e i ei = N(0, )
We estimate the equation using our sample of data.

Model for the sample data:


d
ˆ + ˆxi + êi
yi = ↵ êi = N(0, ˆ )
Estimated equation (the fit):

ˆ + ˆx
E (Ŷ |x) = ↵

Slide 18/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

About this estimated equation of the line


Applet: “Sampling from a bivariate population”

Slide 19/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

For Fuel Consumption of Cars

Slide 20/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

The Model

y i = ↵ + xi + e i i = 1, . . . , n
Assumptions:
d
I ei = N(0, )
I ei are independent
I Right form of the equation. . . usually, if the assumptions
about the errors are satisfied =) right equation.

Slide 21/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter Estimates

Method of Least Squares

To determine estimates of the slope ( ˆ) and intercept (↵


ˆ ) values
for the model we use the Method of Least Squares.

In this method, the ˆ and ↵


ˆ are determined in such a way that the
sum of the squared residuals is as small as possible. That is, we
determine the parameter estimates to minimize:
n
X X
(yi ŷi )2 = (resid)2
i=1

As it turns out, this method produces a straight line that connects


the estimated means for each distinct value of car weight; it is very
similar to fitting a line by eye.

Slide 22/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter Estimates

Estimates of ↵, ,

Parameter estimates using the Method of Least Squares:

Pn
ˆ = i=1 (xi x̄)(yi ȳ ) sY
Pn =r
i=1 (xi x̄)2 sX
ˆ = ȳ ˆx̄

v
uP ⇣ ⇣ ⌘⌘2 sP
u n ˆxi
t i=1 yi ˆ
↵ (resid)2
ˆ = =
n 2 n k
sP
(yobserved ŷ )2
=
n 2

Slide 23/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter Estimates

. . . from Minitab

Slide 24/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter Estimates

Slide 25/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter Estimates

Fitted line plot — Minitab

Slide 26/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Assumptions of model
The model assumes that the variation in Fuel Consumptions, within
each level of car weight, is described by a normal distribution with
the same standard deviation for each level of car weight.

Furthermore, it is assumed that the observations are independent.

We check that:
I residuals have come from a normally distributed population of
residuals (normality)
I there is the same variability in Fuel Consumptions for each
level of car weight (constant variance)
I residuals represent random draws from a population
(independence of observations)
Slide 27/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

. . . assumptions re-stated
The model
d
yi = ↵ + xi + e i ei = N(0, )
E (Y |x) = ↵ + x
=) for a particular (fixed) value of the explanatory variable:
E (Y |x) = ↵ + x y -value we ‘expect’ is given by
the equation of the line
ˆ + ˆx
ŷx = ↵ y -value we ‘predict’ is given by
estimated equation of the line
d
Y |x = N(↵ + x, )
That is, for a given value of x, the observations will be:
I ‘centered’ on the line; and
I normally distributed about the line.
Slide 28/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

. . . visually

I observations are centered on the line


I for any given value of X observations have same variability ( )
=) approximately 95% of observations within ±2 of the line

Slide 29/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Checking assumptions — what to look for

I Residuals versus fitted values: Graph should show no


patterns.
I funnel shapes indicate unequal variance (and a need to
transform the response variable);
I curved shapes indicate that the model pattern is not describing
the data pattern (we need a di↵erent type of equation);
I unusual points in the vertical direction indicate points poorly
described by the model.
I Normal probability plot of residuals: Observed data points
should closely follow the straight line; P-value greater than
0.05.

Slide 30/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Example — fuel consumption data (Residuals


vs Fits)

Slide 31/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Residuals vs Fits — problems

Slide 32/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Residuals vs Fits — problems

Slide 33/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Back to Fuel Consumption

Slide 34/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Back to Fuel Consumption

Slide 35/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Model Assumptions

Checking Independence

Assume that cars were randomly selected which would mean that
observations can be treated as independent.

Since assumptions of the model are all satisfied we can now


proceed with formal hypothesis tests to determine if relationship
we are observing in the sample can reasonably be generalised to
the broader population of cars.

We begin by examining the output produced by Minitab.

Slide 36/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Minitab Output

Slide 37/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Minitab Output

Slide 38/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Parameter estimates from Minitab

a=↵
ˆ= se(A) =

b= ˆ= se(B) =

ˆ=

Slide 39/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Confidence Intervals for ↵ and


A 95% CI for , the rate of increase in fuel consumption (L/100km
per kg) is:
ˆ ± t19 (0.975) ⇥ se(B) = 0.008023 ± 2.093 ⇥ 0.000387
= 0.008023 ± 0.00081
= (0.0072, 0.0088)

A 95% CI for ↵, the fuel consumption of a car with zero weight is:

ˆ ± t19 (0.975) ⇥ se(A) =


↵ 0.8178 ± 2.093 ⇥ 0.5064
= 0.8178 ± 1.0599
= ( 1.878, 0.242)

Slide 40/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

. . . degrees of freedom

In both cases the distribution value is obtained form the t


distribution with 19 degrees of freedom because
p both standard
errors are multiples of s = ˆ = 0.389 = 0.151 which, from the
ANOVA, is an estimate of on 19df .

Slide 41/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Hypothesis tests for ↵ and

Slide 42/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Hypothesis tests for ↵ and

H0 : =0
H1 : 6= 0
ˆ 0 0.0080238
test statistic = = = 20.73
se(B) 0.0003870
Slide 42/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

We can also test specific values for ↵ and using the same
approach we have used earlier:
For example:
H0 : = 0.009 = 0

H1 : 6= 0.009

ˆ 0.0080238 0.009
0
test statistic = = = 2.584
se(B) 0.0003870
d
= t19 if H0 is true

Slide 43/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

The ANOVA Hypothesis

H0 : There is no linear relationship between Fuel and Weight


H1 : There is a linear relationship between Fuel and Weight

That is:

d
H0 : yi = µfuel + ei ei = N(0, ) Model (0)
d
H 1 : y i = ↵ + xi + e i ei = N(0, ) Model (1)

As in the previous chapter, we partition the variability and compare


the variability explained by the H1 model with the variability
unexplained by the H1 model.
Slide 44/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

Models compared

Slide 45/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

Analysis by Partitioning variability

Slide 46/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

Source df SS MS F P-value
Regression

Error

Total

Slide 47/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

In general

The general form of the ANOVA table for a regression analysis is:

sum of
source of degrees of squared mean square compared with could it be
variation freedom deviations residual MS by chance?

Source df SS MS F P
Regression k 1 SSreg calculated as MSreg /MSres P-value
Residual n k SSres MS = SS/df
Total n 1 SStotal don’t add

Slide 48/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

Hypothesis Test using F distribution output

Fuel consumption

Slide 49/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

ANOVA and Regression

Hypothesis Test using F distribution output

Slide 50/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Variability around fitted model

Example: Fuel Consumption model S = 0.389126


S = sresid
I measures how the sample data varies about the estimated
model.
I since residuals are distributed normally about the model we
expect approximately 95% of the observations to be within
±2s of the line.
p p
I sresid = MSerror = 0.151

Slide 51/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Variability explained by model

Example: Fuel Consumption model R Sq = 95.8%


R 2 : Coefficient of Determination
I measures how well the data fits the model.
I measures the proportion of the total variability in the response
that is explained by the model.
I it is generally reported as a percentage
I R2 = SSregression
SStotal = 65.09
67.97 = 95.8%
I For Simple Linear Regression only, R 2 = r 2

Slide 52/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

Strength of relationship — r

Strength of the linear relationship is measured by the correlation


coefficient, r . The correlation coefficient can also be obtained from
Minitab.

Stat ! Basic Statistics ! Correlation. . .

It is best used as a descriptive statistic describing the strength and


direction of the linear relationship between two variables.

Slide 53/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

About r

Pearson’s correlation (r ) has the following properties:


I 1  r  1.
I r > 0 describes positive correlation (upward slope of points),
with r = 1 for points exactly along a straight line.
I r < 0 describes negative correlation (downward slope of
points) with r = 1 for points exactly along a straight line.
I r = 0 describes no correlation.
I r 2 = R 2.

Slide 54/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

Inference using r

H0 : ⇢ = 0

is equivalent to testing

H0 : =0

Slide 55/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

Cautionary tales about correlation

1. Gatwick airport data


2. Rochtina E, Mitchell P, Wang JJ.
“Relationship between age and intraocular pressure: the Blue
Mountains eye study” (2002), Clinical Experimental
Opthalmology, 30L 173–175.
Note: a significant correlation does not imply a causal
relationship!

Slide 56/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

Gatwick airport data

Distance from start of take-o↵ to position when aircraft passes over


noise recorder located 1km beyond end of runway. r = 0.931
Slide 57/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Correlation (r )

Gatwick airport data

rsharp = 0.559, rslow = 0.144.


Slide 58/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Influential Observations
Potentially influential points stand out because they have either a
large standardized residual (extreme in the y -direction) or their
x-value lies a long way from the mean of the xs (extreme in the
x-direction)
I If any observation has a standardized residual with modulus
2, then Minitab lists it as an unusual observation
denoted by R.
I standardized residuals are residuals that have been adjusted so
that for any given value of x, residuals have a standard
deviation close to 1.
I Observations that are extreme in the x-direction may exert a
lot of influence over the model. . . they have potentially high
leverage. Points with unusual values for leverage are also
listed by Minitab (X).
Slide 59/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Leverage

I Points with High leverage are extreme in the x-direction


I The more unusual a point is in the explanatory variable, the
more it can potentially change the estimates of the
parameters (↵, , etc.)
I A high leverage point can potentially a↵ect the value of
R-squared: the e↵ect on R 2 depends on:
I if the point is in line with the general trend of the remaining
data ( =) R 2 "); or
I if it pulls the line away from the general trend ( =) R 2 #).

Slide 60/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Leverage

I Values can be computed which quantify how unusual a point


is in the explanatory variable(s). . . denoted by HI in Minitab
I Observations with
3(p + 1)
HI
n
where p = the number of explanatory variables, are identified
as unusual by Minitab.
I A large leverage value doesn’t necessarily mean the
observation is a problem or should be omitted, but it can help
us to understand other problems.

Slide 61/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Example 1: High Leverage points

Slide 62/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

. . . point deleted

Slide 63/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

. . . point included

Slide 64/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

. . . point included

Slide 65/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Caution!

I Removing points from the analysis is quite controversial and


should not be done lightly.
I Whether to do so or not depends on context and on the
extremity of the point(s). Experience helps here.
I If in doubt, a good approach is to do the analysis with the
points in and again with the points removed
I if removing the points has little e↵ect on the analysis, then
just report results of the analyses with all of the points;
I if removing the points has a substantial e↵ect, then report
both sets of results.

Slide 66/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Example 2: Influential Points


Cadmium in fish:

Slide 67/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

ANOVA Output

Slide 68/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

ANOVA Output

E↵ect of High leverage point on the model. . . and predictions using


the model.

Slide 69/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Influential Observations

Comparing Regression Lines

Slide 70/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

Confidence Intervals and Prediction


Intervals

Confidence interval: a range of plausible values for a population


parameter (in this case E (Y |x) = µY |x ).
Prediction interval: a range of plausible values for an individual
observation (in this case Y |x).

Let Y be distributed N(µ, ) then,


I a confidence interval for the mean (µ) ! µ (±0) as n ! 1
I a (95%) prediction interval for y [a (future) realisation of Y ]
! µ ± 1.96 as n ! 1.

Slide 71/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

Example: Children and Calories

Predicting how many calories children consume when they sit at


the lunch table.

Define C = number of calories consumed, n = 20 observations.

To predict number of calories for a single child (with 95%


confidence)

Slide 72/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

Prediction Interval

µ̂C = 456 sC = ˆC = 29.94


So we predict, with 95% confidence, that a single child will
consume between:
r
29.94 2
456 ± t19 (0.975) (29.94)2 +
p 20
= 456 ± 2.093 ⇥ 29.942 + 6.692
= 456 ± 64.21
= (391.8, 520.2) calories

Slide 73/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

Prediction Interval

95% Prediction Interval for X (individual observation) assuming


d
X = N(µX , X)

µX ± Z (0.975) ⇥ X
r
2
2 X
µX unknown? µ̂X ± Z (0.975) ⇥ X +
r n
s 2
X
X unknown? µ̂X ± tdferror (0.975) ⇥ sX 2 +
n

r
s 2
X
se(prediction) = sX 2 +
n

Slide 74/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

Prediction Interval

What if we had additional information:


I Time child sat at the table; and
I a model for the relationship between Calories and Time?
What would we predict now?

v
u
u how individuals sample-to-sample
u
µ̂C |time ±dist. value⇥t vary about + variability in
the mean estimating the mean

Slide 75/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Interval Estimates

mast 0010
Confidence Intervals
Dom
And what if you wanted to predict, with 95% confidence, a range
for the expected (or mean) calorie intake (for children who sit at
the table for 33 minutes)
sC
95%CI (µC ) = µ̂C ± tn 1⇥ p
n

95%CI (µC |time=33 ) = µ̂C |time=33 ± tn 2⇥ error in estimating mean


r
sresid 2 \
= µ̂C |time=33 ± tn 2⇥ + (33 x̄)2 Var (B)
n

Slide 76/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Example: Calories and Time

Does how long toddlers sit at the lunch table help predict how
many calories they consume?

Time (x) 21.4 30.8 37.7 33.5 32.8 39.5 22.8 34.1 33.9 43.8
Calories (y) 472 498 465 456 423 437 508 431 479 454
Time (x) 42.4 43.1 29.2 31.3 28.6 32.9 30.6 35.1 33.0 43.7
Calories (y) 450 410 504 437 489 436 480 439 444 408

Slide 77/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

The Model
Stat ! Regression ! Regression ! Fit Regression Model. . .

Slide 78/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

toddlers table
A Full Example

sitting
Interpreting the Model coloring
561.2 30095 time
minute spent
Slope:
For
every 1
table toddlers consume

IItTge
Intercept:

no.me

then toddler would consume

561.2 cal Slide 79/99


Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Interpreting the Model

Slope: We expect calorie intake to decrease by 3.095 for every


additional minute that toddlers sit at the table.

Intercept: If they don’t sit at the table at all (t = 0), we expect


that calorie intake would be 561.2.

Slide 80/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Graphically

5612
3.0951

Slide 81/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Assessing Assumptions

0.047
p

Slide 82/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Assessing Assumptions

Slide 83/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Inference

I Give the meaning of the slope of the regression line and find a
95% confidence interval for the slope.
I What proportion of the variation in calories is explained by the
relationship between Calories and Time?
R
I Find the correlation between Calories and Time.
F
I What can you conclude about the relationship between Time
and Calories?
I Is this conclusion consistent with the results given in the
ANOVA Table?

Slide 84/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

Making Predictions
Predict the expected calorie consumption for toddlers who sit at
the table for 33 minutes.
(95%) confidence interval for E (Y |x)

µ̂Y |x=33 = ŷx=33


= 561.2 (3.095 ⇥ 33)
= 459.065
r
s2 \
se(fit) =
rn
+ (x x̄)2 Var (B)
0
545.5
= + (33 34)2 (0.851)2
20
= 5.29
Slide 85/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

95% Confidence Interval for µY |x=33

To

95%CI (parameter) = estimate ± dist. value ⇥ se(estimator)


95%CI (µY |x=33 ) = ŷx=33 ± t18 (0.975) ⇥ se(fit)
= 459.065 ± 2.101 ⇥ 5.29
= 459.065 ± 11.11
= (447.96, 470.17)

We are 95% confident that toddlers who sit at the table for 33
minutes will consume, on average, somewhere between 448 and
470 calories.

Slide 86/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

A Full Example

95% Prediction Interval for Y |x = 33

Predict the calorie consumption for a toddler who sits at the table
for 33 minutes.

95%PI (Y |x = 33) = ŷ ± t18 (0.975) ⇥ se(prediction)


q
= ŷ ± 2.101 ⇥ s 2 + (se(fit))2
q
= 459.065 ± 2.101 ⇥ 545.5 + (5.29)2
= 459.065 ± 50.314
= (408.75, 509.38)

We predict, with 95% confidence, that a toddler who sit at the


table for 33 minutes will consume between 409 and 509 calories.

Slide 87/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Corn Yield and Fertiliser

0
1

Slide 88/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Comparing Three Approaches


1. Regression Model
(Explanatory treated as numeric) — we only have 2 values for
explanatory!
2. One-way ANOVA
(Explanatory treated as categorical)
3. 2 sample t-test assuming equal variance
(Explanatory treated as categorical)
Points to note:
I Models compared, estimating parameters
I Conclusion from a test in each case
I spooled = sresid in each case
I t-value for independent 2 samples t-test and F -ratio in the
ANOVA table
Slide 89/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Approach 1: Regression

Me

Slide 90/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Approach 2: One-way ANOVA

7
Slide 91/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Approach 3: Independent 2 sample t-test


2
7 82
61.16
06

Slide 92/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression models

For this course, we only consider regression models where:


Response variable numeric
Explanatory variable numeric
(Simple) straight line regression:
I Model the relationship between the response and the
explanatory
I Interpret the relationship
I Diagnose if the model is “appropriate” (check assumptions)
I Make predictions (for the response) using the model.

Slide 93/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

Find an appropriate equation


I examine the scatterplot
i
I look at resids vs fits graph

Not good
Slide 94/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

Estimate parameters in model and interpret


↵ˆ ˆ ˆ
Estimated Estimated slope Estimated error
intercept
(estimated) value (estimated) in- sresid : (estimated) error.
of response variable crease in response Indicates how data varies
when explanatory variable for a one about the model.
variable = 0 unit increase in Can be estimated using
explanatory
of variable ANOVA Table — partitioning
(while holding other of variability
explanatory vari-
ables constant)
se(A). . . get from se(B). . . get from
Minitab Minitab

Slide 95/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

Random
Sample
Check the assumptions of the model

Indep obs

plte Normally
errors
distributed
w
I const equal var

pad correct model shape


Slide 96/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

CI estimates and Hypothesis Testing


I On the slope,
I Significant relationship
0
Ho β
I Specific relationship

I Using ANOVA Table and F -test


Ho β 0.7
I Significant relationship

Fdfi dfa Slide 97/99


Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

Additional insights about the model using:


I Pearson’s Correlation coefficient — r

1B
I Coefficient of determination — R 2

I Unusual observations — 55 IEa


regress
I Large standardised residuals (‘extreme’ in y -direction)
I High influence points (‘extreme’ in the x-direction)

Slide 98/99
Introduction Regression Model Inference Assessing the model Inference for the Response Summary

Regression Analysis — an overview

Making predictions using the model:


I Confidence Intervals for mean of response variable (µY |x0 )

tdferror
se fit
I Prediction Intervals for individual observation (Y |x0 )

tatemer it id

Tae Toot Slide 99/99

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy