ANCOVA
ANCOVA
These third variables are called covariates as they share an essential amount of
variance with the dependent variable.
When working with experimental data, it helps to reduce error variance, thus
increasing the statistical power of the statistical test.
The inclusion of covariates should follow the theoretical assumption and thus be
decided prior to data analysis.
ANCOVA is a blend of analysis of variance (ANOVA) and regression.
It is similar to factorial ANOVA, in that it can tell us what additional information one can get by
considering one independent variable (factor) at a time, without the influence of the others.
Although ANCOVA is usually used when there are differences between your baseline groups, it
can also be used in pre-test/post-test analysis when regression to the mean affects your post-test
measurement.
The technique is also common in non-experimental research (e.g. surveys) and for quasi-
experiments (when study participants can’t be assigned randomly). However, this particular
application of ANCOVA is not always recommended.
Extension of Multiple Regression
When used as an extension of multiple regression, ANCOVA can test all of the
regression lines to see which have different Y intercepts as long as the slopes for all
lines are equal.
ANCOVA removes any effect of covariates, which are variables we don’t want to study.
For example, I might want to study how different levels of teaching skills affect student
performance in math,
It may not be possible to randomly assign students to classrooms.
I’ll need to account for systematic differences between the students in different classes
(e.g. different initial levels of math skills between gifted and mainstream students).
Example
You might want to find out if a new drug works for depression.
The study has three treatment groups and one control group.
ANCOVA can control for other factors that might influence the outcome. For
example: family life, job status, or drug use.
Extension of ANOVA
One can use multiple possible covariates. However, the more the entries, the fewer
the degrees of freedom are.
Entering a weak covariate isn’t a good idea as it will reduce the statistical power. The
lower the power, the less likely one will be able to rely on the results from your test.
Strong covariates have the opposite effect: it can increase the power of your test.
Need to check that the following are true before running the test:
3.Make sure observations are independent. In other words, don’t put people into
more than one group.
.
Software can usually check the following assumptions.
xx
x xx x x xx Standard
Fertilizer
xxx
xxx x xx + Supplement Q
xx x
x x xx x x x + Supplement R
xx x
x xx x x xx Standard
Fertilizer
xx
x xxxx x xx + Supplement Q
x xx
xx xx x x x + Supplement R
Plotting of the initial weights by group shows that the groups were not
equal when it came to initial weights.
23-14
Weight Gain to Initial
Weight
Standard Fertilizer
Weight (kg)
2
wF
2
w gain
1 2
1
wF wi
w gain 1
wi
age
If fruits taken for the study are at different ages, they have
different initial weights and are at different points on the growth
curve. Expected weight gains will be different depending on
age at entry into study.
Regression of Initial Weight to
Weight Gain
Weight
Gain If we disregard the age
(g/day) of the fruit but instead
2 (Y)
w gain focus on the initial
weight, we see that there
is a linear relationship
between initial weight
1 and the weight gain
w gain expected.
Initial
Weight
1 2 (x)
wi wi
Covariates
Initial weight in the previous example is a covariable or covariate.
In the EXAMPLE, had the horticulturist known that the fruits were very
variable in initial weight (or age), he could have:
• Created blocks of 3 or 6 equal weight fruits, and randomized treatments
to the plants within these blocks.
• This would have entailed some cost in terms of time spent sorting the fruit
plants and then keeping track of block membership over the life of the
study.
• It was much easier to simply record the fruit initial weight and then use
analysis of covariance for the final analysis.
• In many cases, due to the continuous nature of the covariate, blocking is
just not feasible.
Expectations under Ho
Under Ho: no treatment
If all fruits had come in with the same initial
effects. weight, all three treatments would produce the
same weight gain.
Expected
Weight
Gain
(g/day)
(Y)
Initial
Weight
(x)
Average Fruit Weight
Expectations under HA
Under Ha: Significant
Treatment + Supplement Q (q)
effects
+ Supplement R (r)
WGR
WGs
WGQ
Expected
Weight
Gain cc c
(g/day) qq r rr Initial
c cc c c cc
(Y) q qqqq q qq rr rr r r r Weight
(x)
23-20
Observed Responses
under H
Suppose now thatAdifferent supplements actually do increase weight gain.
This translates to fruiting plants in different treatment groups following
different, but parallel regression lines with initial weight. + Supplement Q
+ Supplement R
r
rr r Standard fertilizer
WGR q r rr r
q rr
WGQ qq q
q q c
q q c cc
WGs q c cc c Under HA: Significant
c c Treatment
effects
Weight
Gain
(g/day) cc c
(Y) qq r rr Initial
c cc c c cc
q qqqq q qq rr rr r r r Weight
(x)
What difference in weight gain is due to Initial
23-21
weight and what is due to Treatment?
Observed Group Simple one-way classification ANOVA
Means (without accounting for initial weight) gives
us the wrong answer!
Weight
Gain
(g/day) + Supplement Q
(Y)
+ Supplement R
r rr
yr q
rr
r r r
Standard fertilizer
q rr
yq qq q
q q c c
q q q c c
c c
yc c c
cc
cc c
qq r rr Initial
c cc c c cc
Unadjusted
q qqqq q qq rr rr r r r Weight
treatment means (x)
Predicted Average Expected weight gain is
computed for treatments for the
Responses average initial weight and
Weight comparisons are then made.
Gain
(g/day) + Supplement Q
(Y)
+ Supplement R
r rr
yq| X = x q
rr
r r r
Standard Fertilizer
q rr
y r | X = x qq q
q q c c
q q q c c
yc| X = x c c c
c
cc
cc c
qq r rr Initial
c cc c c cc
Adjusted
q qqqq q qq rr rr r r r Weight
treatment means (x)
X=x
23-23
ANCOVA: Objectives
The objective of an analysis of covariance is to
compare the treatment means after adjusting for
differences among the treatments due to
differences in the covariate levels for the
treatments groups.
The analysis proceeds by combining a regression
model with an analysis of variance model.
23-24
Model
E(yij ) = m+ a i + bx ij
The i, i=1,…,t, are estimates of how each of the t treatments modifies the
overall mean response. (The index j=1,…,n, runs over the n replicates
for each treatment.)
23-25
A Priori Assumptions
The covariate is related to the response, and can account for variation
in the response.
Check with a scatterplot of Y vs. X.
Example
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
Four different formulations of an industrial glue are 2 48.7 12
being tested. The tensile strength (response) of the 2 49.0 10
glue is known to be related to the thickness as 2 50.1 11
applied. Five observations on strength (Y) in 2 48.5 12
pounds, and thickness (X) in 0.01 inches are made 2 45.2 14
for each formulation. 3 46.3 15
3 47.1 14
Here: 3 48.9 11
• There are t=4 treatments (formulations of glue). 3 48.2 11
• Covariate X is thickness of applied glue. 3 50.3 10
• Each treatment is replicated n=5 times at different 4 44.7 16
values of X. 4 43.0 15
4 51.0 10
4 48.1 12
4 46.8 11
23-27
Formulation Profiles
52.0
48.0
Strength
(Y)
44.0
40.0
16 15 10 12 11
Thickness (X)
23-28
SAS Program data glue;
input Formulation Strength Thickness;
datalines;
1 46.5 13
1 45.9 14
1 49.8 12
1 46.1 12
1 44.3 14
2 48.7 12
2 49.0 10
2 50.1 11
2 48.5 12
2 45.2 14
3 46.3 15
3 47.1 14
3 48.9 11
3 48.2 11
3 50.3 10
The basic model is a combination 4 44.7 16
of regression and one-way 4 43.0 15
4 51.0 10
classification. 4 48.1 12
4 46.8 11
;
run;
proc glm;
class formulation;
model strength = thickness formulation
/ solution ;
lsmeans formulation / stderr pdiff;
run;
23-29
Output: Use Type III SS to test significance of each variable
MSE
Divide by
Standard
MSE to get
Parameter Estimate Error t Value Pr > |t|
mean
Intercept 58.93698630 B 2.21321008 26.63 <.0001 squares.
Thickness -0.95445205 0.16705494 -5.71 <.0001
Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912
Formulation 2 0.62554795 B 0.82451389 0.76 0.4598
Formulation 3 0.86732877 B 0.81361075 1.07 0.3033
Formulation 4 0.00000000 B . . .
23-30
Least Squares Means
(Adjusted Formulation means
computed at the average value of
Thickness [=12.45]) The GLM Procedure
Least Squares Means
i/j 1 2 3 4
1 0.4574 0.3011 0.9912
2 0.4574 0.7695 0.4598
3 0.3011 0.7695 0.3033
4 0.9912 0.4598 0.3033
23-31