Analysis of Pretest and Posttest Scores
Analysis of Pretest and Posttest Scores
I. Overview
In previous sets of notes in this series we analyzed a pretest-posttest, two-group, quasi-
experimental design using blocking, matching, and analysis of covariance procedures.
Those procedures were used to analyze the differences in posttest scores after
any pretest score differences were "held constant." In this set of notes we will take a
different approach and look at the change from the pretest and posttest scores.
Hypothetical pretest and posttest trait anxiety means for a two group design are shown
in Figure 1. The data that we displayed as a scattergram in the analysis of covariance
notes are redisplayed here using the pretest and posttest means within each treatment
condition. The question of interest is whether the improvement in scores from pretest to
posttest is greater for the treatment group than it is for the control group.
The question can be answered by
computing the difference between
the pretest and posttest scores for
each person and then analyzing
those differences in a one way
ANOVA using treatment (treatment
vs. control) as the only factor. If the
treatment main effect is significant,
then the change from pretest to
posttest is not the same in the two
groups. This analysis of difference
scores is also called a gain
score analysis.
Another way of answering this
question is by looking at the
interaction effect in a 2 x 2 analysis
of variance (ANOVA) with treatment (treatment vs. control) as a between subjects factor
and time (pretest vs. posttest) as a within subjects factor. If the interaction is significant,
then the change between pretest and posttest is not the same in the two treatment
conditions.
It will be shown that the treatment by time interaction effect in the 2 x 2 analysis of
variance yields identical statistical results to the treatment main effect in the gain score
analysis.
II. Analysis of Variance of Gain Scores
The general approach to a gain score analysis is: (a) to compute the gain score, and
then (b) analyze those gain scores in an analysis of variance with treatment as the
between-subjects factor.
The Results
The abbreviated Table 2. Tests of Between-Subjects Effects
analysis of variance Dependent Variable: COMPUTE gain = tanxpost - tanxpre
output is shown in
Table 2. The
means, standard Source Type III df Mean F Sig.
errors, and 95% Sum of Squares Square
confidence intervals
for each mean are
shown in Table TREATGRP 4010.641 1 4010.641 47.140 .000
3. The results can
be summarized as
follows: 6466.038 76 85.079
Error
Trait anxiety gain
scores (posttest -
pretest) were Total 16705.000 78
analyzed in an
analysis of
variance with Table 3. Means, Standard errors, and 95% Confidence Interval for
treatment group the Two Treatment Conditions
(treatment vs. Dependent Variable: COMPUTE gain = tanxpost - tanxpre
control) as the
independent
variable. The Mean Std. Error 95%
decrease in trait Confidence Interval
anxiety was
greater for
participants in the Condition Lower Upper
treatment Bound Bound
condition (M = -
15.93, SE = 1.46) Treatment -15.925 1.458 -18.830 -13.020
than for those in
the control
condition (M = - Control -1.579 1.496 -4.559 1.401
1.579, SE =
1.50), F (1, 76) =
47.14, p < .0005.
Overall Analysis
The primary output from Table 5. Tests of Within-Subjects Effects
the analysis of variance
is divided into two parts
tables, the within Source Type III df Mean F Sig.
subject effects, see Sum of Square
Table 5, and the Squares
between subjects
effects, see Table 6.
The output has been 2985.321 1 2985.321 70.177 .000
TIME
abbreviated somewhat
for the purposes of this
discussion. 2005.321 1 2005.321 47.140 .000
TIME *
As shown in Table 5, TREATGRP
the interaction between
treatment and time is
significant, F (1, 76) = Error(TIME) 3233.019 76 42.540
47.14, p < .0005. The
interaction will be
interpreted with simple
main effects analysis Table 6. Tests of Between-Subjects Effects
looking at the effects of
time within each
treatment. The Source Type III df Mean Square F Sig.
significant time main Sum of
effect, F (1, 76) = 70.18, Squares
p < .0005 must be
interpreted in light of the
interaction effect. As TREATGRP 19.206 1 19.206 .135 .714
shown in Table 6, the
main effect for
treatment was not Error 10800.323 76 142.110
significant, F (1, 76) =
0.14, p = .714.
Conceptually, the interaction term in this 2 x 2 ANOVA can be thought of as a
comparison of the changes from pretest to posttest within each treatment group (see
the formula below). If the changes from pretest to posttest are identical in each group,
e.g., if the improvement is the same for each group, then there is no interaction. If the
change from pretest to posttest is greater in one group than the other group, e.g., if one
group improves more than the other group, then there is an interaction. An interaction
could also occur if one group improved from pretest to posttest while the other group
deteriorated.
IV. Discussion
Alternative explanations
Both the gain score analysis and the repeated measures analysis ignore the
(significant) pretest differences on trait anxiety. Can you think of any alternative
explanations to this outcome that are based on the existing pretest differences? For
example, can the regression towards the mean effect account for the pattern of results?
The interaction is a comparison of the differences between the posttest and pretest
scores in each treatment group. As we noted earlier, if the difference is the same in
each treatment group, there is no interaction. If the difference is not the same in each
treatment group, then there is an interaction. Most computer programs such as SPSS
handle the within subjects factor, e.g., time, by literally creating a difference score for
each person by subtracting the posttest score from the pretest score. The test of the
main effect of time is a test of whether the overall mean difference score (across both
treatment groups) is different from zero. The test of the interaction is a test of whether
the mean difference score for the treatment group is different from the mean difference
score for the control group. In the gain score analysis we first computed the difference
between the posttest and pretest scores and then tested whether the differences were
the same for each treatment group. Thus the treatment main effect in the gain score
analysis is the same as the time by treatment interaction in the 2 x 2 ANOVA.
The interaction term in the ANOVA was significant. The details of the interaction were
analyzed using a simple main effects analysis of the effects of time within each
treatment condition. The simple main effects analysis indicated a significant change
from pretest to posttest in the treatment condition, but not in the control
condition. Similarly, the treatment main effect in the gain score analysis was
significant. The details of the main effect were analyzed using the 95% confidence
intervals for each of the group means. The 95% confidence interval analysis indicated a
significant change from pretest to posttest in the treatment condition, but not in the
control condition.
Technical note. You may have noted that although the F values for the gain score main
effect and ANOVA interaction effect are the same, the sums of squares are not the
same. This is due to the way in which SPSS creates the difference scores. Think of
creating the difference score by multiplying the individual scores by a coefficient (or
weight) called "c" -
Gain = c1*posttest + c2*pretest
When we computed the gain scores c1was set to +1 and c2was set to -1, that is, we
simply subtracted the pretest score from the posttest score -
Gain = (+1)*posttest + (-1)*pretest
SPSS "orthonormalizes" the coefficients so that the sum of the squares of the
coefficients is equal to 1.00. The coefficients used by SPSS are as follows -
Gain = (+0.707107)*posttest + (-0.707107)*pretest
If you square each of the coefficients (0.707107� = .5000) and sum them the result is
1.00.
You could check this out for yourself by using the SPSS coefficients to manually create
the gain score and then run the gain score analysis. You would find that both the sums
of squares and the F value from the gain score analysis would equal the sums and
squares and F value from the interaction term in the ANOVA.
V. References
Cattell, R. B. (1983). The clinical use of difference scores: Some psychometric
problems. Multivariate Experimental Clinical Research, 6, 87-98.
Gardner, R. C. (1987). Use of the simple change score in correlational
analysis. Educational and Psychological Measurement, 47, 849-864.
Humphreys, L. G. (1989). Some comments on the relationship between reliability and
statistical power. Applied Psychological Measurement, 13, 419-425.
Karabinus, R. A. (1983). The use of ANOVA, multiple regression, repeated ANOVA,
and effect size. Evaluation Review, 7, 841-850.
Lord, F. M. (1956). The measurement of growth. Educational and Psychological
Measurement, 16, 421-437.
Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris
(Ed.), Problems in measuring change. Madison, WI: University of Wisconsin Press.
Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference
score in the measurement of change. Journal of Educational Measurement, 20, 335-
343.
Stemmler, G. (1987). Implicit measurement models in methods for scoring
physiological reactivity. Journal of Psychophysiology, 1, 113-125.
Williams, R. H., Zimmerman, D. W., Rich, J. M., & Steed, J. L. (1984). An empirical
study of the relative error magnitude in three measures of change. Journal of
Experimental Education, 53, 55-57.=