0% found this document useful (0 votes)
18 views31 pages

ANCOVA

Uploaded by

austrasan91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views31 pages

ANCOVA

Uploaded by

austrasan91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

ANCOVA

Analysis of covariance (ANCOVA)

A statistical method that allows accounting for third variables when


investigating the relationship between an independent and a dependent
variable.

These third variables are called covariates as they share an essential amount of
variance with the dependent variable.

The relationship between covariate and dependent variable is statistically


controlled (or partialized out) when looking at the relationship between
independent and dependent variable.
ANCOVA
can be applied in the context of both analysis of variance and regression analysis
and
can be used with experimental as well as non-experimental data.
However, depending on the research design, ANCOVA serves different goals.

When working with experimental data, it helps to reduce error variance, thus
increasing the statistical power of the statistical test.

For this to work, independent variable and covariate should be independent.

With non-experimental data, ANCOVA can help to identify the unique


contribution of an independent variable in explaining the dependent variable and
to rule out alternative explanations for the relationship.

The inclusion of covariates should follow the theoretical assumption and thus be
decided prior to data analysis.
ANCOVA is a blend of analysis of variance (ANOVA) and regression.

It is similar to factorial ANOVA, in that it can tell us what additional information one can get by
considering one independent variable (factor) at a time, without the influence of the others.

It can be used as:


•An extension of multiple regression to compare multiple regression lines

•An extension of analysis of variance

Although ANCOVA is usually used when there are differences between your baseline groups, it
can also be used in pre-test/post-test analysis when regression to the mean affects your post-test
measurement.

The technique is also common in non-experimental research (e.g. surveys) and for quasi-
experiments (when study participants can’t be assigned randomly). However, this particular
application of ANCOVA is not always recommended.
Extension of Multiple Regression

When used as an extension of multiple regression, ANCOVA can test all of the
regression lines to see which have different Y intercepts as long as the slopes for all
lines are equal.

Like regression analysis, ANCOVA enables us to look at how an independent


variable acts on a dependent variable.

ANCOVA removes any effect of covariates, which are variables we don’t want to study.

For example, I might want to study how different levels of teaching skills affect student
performance in math,
It may not be possible to randomly assign students to classrooms.
I’ll need to account for systematic differences between the students in different classes
(e.g. different initial levels of math skills between gifted and mainstream students).
Example

You might want to find out if a new drug works for depression.

The study has three treatment groups and one control group.

A regular ANOVA can tell you if the treatment works.

ANCOVA can control for other factors that might influence the outcome. For
example: family life, job status, or drug use.
Extension of ANOVA

As an extension of ANOVA, ANCOVA can be used in two ways:

1.To control for covariates (typically continuous or variables on a


particular scale) that aren’t the main focus of your study.

2.To study combinations of categorical and continuous variables, or


variables on a scale as predictors.
In this case, the covariate is a variable of interest (as opposed to one you
want to control for).
Within-Group Variance

ANCOVA can explain within-group variance. It takes the unexplained


variances from the ANOVA test and tries to explain them with confounding
variables (or other covariates).

One can use multiple possible covariates. However, the more the entries, the fewer
the degrees of freedom are.
Entering a weak covariate isn’t a good idea as it will reduce the statistical power. The
lower the power, the less likely one will be able to rely on the results from your test.
Strong covariates have the opposite effect: it can increase the power of your test.

General steps for ANCOVA


1.Run a regression between the independent and dependent variables.
2.Identify the residual values from the results.
3.Run an ANOVA on the residuals.
Assumptions for ANCOVA

Assumptions are basically the same as the ANOVA assumptions.

Need to check that the following are true before running the test:

1.Independent variables (minimum of two) should be categorical variables.

2.The dependent variable and covariate should be continuous variables (measured


on an interval scale or ratio scale.)

3.Make sure observations are independent. In other words, don’t put people into
more than one group.
.
Software can usually check the following assumptions.

1. Normality: the dependent variable should be roughly normal for each


of category of independent variables.

2. Data should show homogeneity of variance.

3. The covariate and dependent variable (at each level of independent


variable) should be linearly related.

4. Your data should be homoscedastic of Y for each value of X.

5. The covariate and the independent variable shouldn’t interact. In


other words, there should be homogeneity of regression slopes24
• A procedure for comparing treatment means that incorporates
information on a quantitative explanatory variable, X, sometimes
called a covariate.

• The procedure, ANCOVA, is a combination of ANOVA with


regression.
Example: Pumpkin Weight Gain
• A horticulturist wishes to examine the impact of a pair of new soil
supplements on pumpkin weight gain (response).
• Three treatments are defined: standard fertilizer, standard fertilizer +
supplement Q, and standard fertilizer + supplement R.
• All new fruits from a large population are available for use as study units. He
selects 30 pumpkins for study. These are randomized to the three fertilizers
at random (completely randomized design).
• Initial weights are recorded i.e. when fruit formation has begun, then
plants are placed on the three fertilizers. At the end of the fruiting periods
the final weight is taken and weight gain is computed.
• Simple analysis of variance and associated multiple comparison procedures
indicate no significant differences in weight gain between the two
supplements, but big differences between the supplements and the standard
fertilizer.
• Is this the end of the story? …
ANOVA Results
Average Weight Gain
(Response g/day)

xx
x xx x x xx Standard
Fertilizer
xxx
xxx x xx + Supplement Q

xx x
x x xx x x x + Supplement R

Simple ANOVA of a one-way classification would suggest no


difference between Supplements Q and R but both different from
Standard fertilizer
Initial Weights
Initial Weight

xx x
x xx x x xx Standard
Fertilizer
xx
x xxxx x xx + Supplement Q

x xx
xx xx x x x + Supplement R

Plotting of the initial weights by group shows that the groups were not
equal when it came to initial weights.

23-14
Weight Gain to Initial
Weight
Standard Fertilizer
Weight (kg)

2
wF
2
w gain
1 2
1
wF wi
w gain 1
wi
age

If fruits taken for the study are at different ages, they have
different initial weights and are at different points on the growth
curve. Expected weight gains will be different depending on
age at entry into study.
Regression of Initial Weight to
Weight Gain
Weight
Gain If we disregard the age
(g/day) of the fruit but instead
2 (Y)
w gain focus on the initial
weight, we see that there
is a linear relationship
between initial weight
1 and the weight gain
w gain expected.
Initial
Weight
1 2 (x)
wi wi
Covariates
Initial weight in the previous example is a covariable or covariate.

A covariate is a disturbing variable (confounder), that is, it is known to have


an effect on the response. Usually, the covariate can be measured but
often we may not be able to control its effect through blocking.

In the EXAMPLE, had the horticulturist known that the fruits were very
variable in initial weight (or age), he could have:
• Created blocks of 3 or 6 equal weight fruits, and randomized treatments
to the plants within these blocks.
• This would have entailed some cost in terms of time spent sorting the fruit
plants and then keeping track of block membership over the life of the
study.
• It was much easier to simply record the fruit initial weight and then use
analysis of covariance for the final analysis.
• In many cases, due to the continuous nature of the covariate, blocking is
just not feasible.
Expectations under Ho
Under Ho: no treatment
If all fruits had come in with the same initial
effects. weight, all three treatments would produce the
same weight gain.

Expected
Weight
Gain
(g/day)
(Y)

Initial
Weight
(x)
Average Fruit Weight
Expectations under HA
Under Ha: Significant
Treatment + Supplement Q (q)
effects
+ Supplement R (r)

WGQ Standard fertilizer


(c)
WGR

WGs Different treatments


produce different weight
gains for fruits of the
Expected
same initial weight.
Weight
Gain
(g/day)
(Y) Initial
Weight
(x)
Average Fruit Weight
Different Initial Weights
Under Ho: no treatment If the average initial weights in the
effects. treatment groups differ, the observed
weight gains will be different, even if
treatments have no effect.

WGR
WGs
WGQ

Expected
Weight
Gain cc c
(g/day) qq r rr Initial
c cc c c cc
(Y) q qqqq q qq rr rr r r r Weight
(x)

23-20
Observed Responses
under H
Suppose now thatAdifferent supplements actually do increase weight gain.
This translates to fruiting plants in different treatment groups following
different, but parallel regression lines with initial weight. + Supplement Q

+ Supplement R
r
rr r Standard fertilizer
WGR q r rr r
q rr
WGQ qq q
q q c
q q c cc
WGs q c cc c Under HA: Significant
c c Treatment
effects

Weight
Gain
(g/day) cc c
(Y) qq r rr Initial
c cc c c cc
q qqqq q qq rr rr r r r Weight
(x)
What difference in weight gain is due to Initial
23-21
weight and what is due to Treatment?
Observed Group Simple one-way classification ANOVA
Means (without accounting for initial weight) gives
us the wrong answer!
Weight
Gain
(g/day) + Supplement Q
(Y)
+ Supplement R
r rr
yr q
rr
r r r
Standard fertilizer
q rr
yq qq q
q q c c
q q q c c
c c
yc c c
cc

cc c
qq r rr Initial
c cc c c cc
Unadjusted
q qqqq q qq rr rr r r r Weight
treatment means (x)
Predicted Average Expected weight gain is
computed for treatments for the
Responses average initial weight and
Weight comparisons are then made.
Gain
(g/day) + Supplement Q
(Y)
+ Supplement R
r rr
yq| X = x q
rr
r r r
Standard Fertilizer
q rr
y r | X = x qq q
q q c c
q q q c c
yc| X = x c c c
c
cc

cc c
qq r rr Initial
c cc c c cc
Adjusted
q qqqq q qq rr rr r r r Weight
treatment means (x)
X=x
23-23
ANCOVA: Objectives
The objective of an analysis of covariance is to
compare the treatment means after adjusting for
differences among the treatments due to
differences in the covariate levels for the
treatments groups.
The analysis proceeds by combining a regression
model with an analysis of variance model.

23-24
Model
E(yij ) = m+ a i + bx ij
The i, i=1,…,t, are estimates of how each of the t treatments modifies the
overall mean response. (The index j=1,…,n, runs over the n replicates
for each treatment.)

The slope coefficient, , is a measure of how the average response changes


as the value of the covariate changes.

The analysis proceeds by fitting a linear regression model with dummy


variables to code for the different treatment levels.

23-25
A Priori Assumptions
The covariate is related to the response, and can account for variation
in the response.
Check with a scatterplot of Y vs. X.

The covariate is NOT related to the treatments.


If Y is related to X, then the variance of the treatment differences is
increased relative to that obtained from an ANOVA model
without X, which results in a loss of precision.

The treatment’s regression equations are linear in the


covariate.
Check with a scatterplot of Y vs. X, for each treatment. Non-linearity
can be accommodated (e.g. polynomial terms, transforms), but
analysis may be more complex.

The regression lines for the different treatments are parallel.


This means there is only one slope in the Y vs. X plots. Non-parallel
lines can be accommodated, but this complicates the analysis
since differences in treatments
23-26 will now depend on the value of
Formulation Strength Thickness

Example
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
Four different formulations of an industrial glue are 2 48.7 12
being tested. The tensile strength (response) of the 2 49.0 10
glue is known to be related to the thickness as 2 50.1 11
applied. Five observations on strength (Y) in 2 48.5 12
pounds, and thickness (X) in 0.01 inches are made 2 45.2 14
for each formulation. 3 46.3 15
3 47.1 14
Here: 3 48.9 11
• There are t=4 treatments (formulations of glue). 3 48.2 11
• Covariate X is thickness of applied glue. 3 50.3 10
• Each treatment is replicated n=5 times at different 4 44.7 16
values of X. 4 43.0 15
4 51.0 10
4 48.1 12
4 46.8 11

23-27
Formulation Profiles
52.0

48.0
Strength
(Y)
44.0

40.0
16 15 10 12 11
Thickness (X)

Form_1 Form_2 Form_3 Form_4

23-28
SAS Program data glue;
input Formulation Strength Thickness;
datalines;
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
2 48.7 12
2 49.0 10
2 50.1 11
2 48.5 12
2 45.2 14
3 46.3 15
3 47.1 14
3 48.9 11
3 48.2 11
3 50.3 10
The basic model is a combination 4 44.7 16
of regression and one-way 4 43.0 15
4 51.0 10
classification. 4 48.1 12
4 46.8 11
;
run;
proc glm;
class formulation;
model strength = thickness formulation
/ solution ;
lsmeans formulation / stderr pdiff;
run;

23-29
Output: Use Type III SS to test significance of each variable
MSE

Source DF Squares Mean Square F Value Pr > F


Model 4 66.31065753 16.57766438 10.17 0.0003
Error 15 24.44684247 1.62978950
Corrected Total 19 90.75750000 Regression on
thickness is
R-Square Coeff Var Root MSE Strength Mean significant.
0.730636 2.691897 1.276632 47.42500 No formulation
differences.

Source DF Type I SS Mean Square F Value Pr > F


Thickness 1 63.50120135 63.50120135 38.96 <.0001
Formulation 3 2.80945618 0.93648539 0.57 0.6405

Source DF Type III SS Mean Square F Value Pr > F


Thickness 1 53.20115753 53.20115753 32.64 <.0001
Formulation 3 2.80945618 0.93648539 0.57 0.6405

Divide by
Standard
MSE to get
Parameter Estimate Error t Value Pr > |t|
mean
Intercept 58.93698630 B 2.21321008 26.63 <.0001 squares.
Thickness -0.95445205 0.16705494 -5.71 <.0001
Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912
Formulation 2 0.62554795 B 0.82451389 0.76 0.4598
Formulation 3 0.86732877 B 0.81361075 1.07 0.3033
Formulation 4 0.00000000 B . . .

23-30
Least Squares Means
(Adjusted Formulation means
computed at the average value of
Thickness [=12.45]) The GLM Procedure
Least Squares Means

Strength Standard LSMEAN


Formulation LSMEAN Error Pr > |t| Number

1 47.0449486 0.5782732 <.0001 1


2 47.6796062 0.5811616 <.0001 2
3 47.9213870 0.5724527 <.0001 3
4 47.0540582 0.5739134 <.0001 4

Least Squares Means for effect Formulation


Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: Strength

i/j 1 2 3 4
1 0.4574 0.3011 0.9912
2 0.4574 0.7695 0.4598
3 0.3011 0.7695 0.3033
4 0.9912 0.4598 0.3033

23-31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy