0% found this document useful (0 votes)
444 views10 pages

Analysis of Pretest and Posttest Scores

The document describes analyzing pretest and posttest scores from a quasi-experimental study using two methods: gain score analysis and repeated measures ANOVA. Gain score analysis involves computing the difference between pretest and posttest scores (the gain), and analyzing the gains using ANOVA. Repeated measures ANOVA analyzes pretest and posttest scores simultaneously using a 2x2 design with time (pretest vs. posttest) as a within-subjects factor and treatment (condition) as a between-subjects factor. Both methods were found to yield identical statistical results when comparing improvement between treatment and control groups from pretest to posttest.

Uploaded by

Aldrin Vingno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
444 views10 pages

Analysis of Pretest and Posttest Scores

The document describes analyzing pretest and posttest scores from a quasi-experimental study using two methods: gain score analysis and repeated measures ANOVA. Gain score analysis involves computing the difference between pretest and posttest scores (the gain), and analyzing the gains using ANOVA. Repeated measures ANOVA analyzes pretest and posttest scores simultaneously using a 2x2 design with time (pretest vs. posttest) as a within-subjects factor and treatment (condition) as a between-subjects factor. Both methods were found to yield identical statistical results when comparing improvement between treatment and control groups from pretest to posttest.

Uploaded by

Aldrin Vingno
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Analysis of Pretest and Posttest Scores with Gain Scores and Repeated Measures

I. Overview
In previous sets of notes in this series we analyzed a pretest-posttest, two-group, quasi-
experimental design using blocking, matching, and analysis of covariance procedures.
Those procedures were used to analyze the differences in posttest scores after
any pretest score differences were "held constant." In this set of notes we will take a
different approach and look at the change from the pretest and posttest scores.
Hypothetical pretest and posttest trait anxiety means for a two group design are shown
in Figure 1. The data that we displayed as a scattergram in the analysis of covariance
notes are redisplayed here using the pretest and posttest means within each treatment
condition. The question of interest is whether the improvement in scores from pretest to
posttest is greater for the treatment group than it is for the control group.
The question can be answered by
computing the difference between
the pretest and posttest scores for
each person and then analyzing
those differences in a one way
ANOVA using treatment (treatment
vs. control) as the only factor. If the
treatment main effect is significant,
then the change from pretest to
posttest is not the same in the two
groups. This analysis of difference
scores is also called a gain
score analysis.
Another way of answering this
question is by looking at the
interaction effect in a 2 x 2 analysis
of variance (ANOVA) with treatment (treatment vs. control) as a between subjects factor
and time (pretest vs. posttest) as a within subjects factor. If the interaction is significant,
then the change between pretest and posttest is not the same in the two treatment
conditions.
It will be shown that the treatment by time interaction effect in the 2 x 2 analysis of
variance yields identical statistical results to the treatment main effect in the gain score
analysis.
II. Analysis of Variance of Gain Scores
The general approach to a gain score analysis is: (a) to compute the gain score, and
then (b) analyze those gain scores in an analysis of variance with treatment as the
between-subjects factor.

Compute the Gain score


The improvement (gain) from pretest to posttest can be computed for each participant
by subtracting each person's pretest score from his or her posttest score -
Gain = posttest - pretest
The SPSS syntax for computing the gain score is as follows:
COMPUTE gain = posttest - pretest.
When you compute a gain score in this manner a positive gain score indicates that the
posttest score was greater than the pretest score, a negative gain score indicates that
the posttest score was less than the pretest score. In our example the dependent
variable is trait anxiety so we expect that successful treatment would lead to lower
anxiety. The gain score should be negative.
The gain score controls for individual differences in pretest scores by measuring the
posttest score relative to the each person's pretest score. But, a gain score analysis
does not control for the differences in pretest scores between the two groups.
The null hypothesis of no difference in improvement between the treatment and control
groups can be tested by an analysis of variance on the gain scores using treatment
(treatment vs. control) as a between subjects factor. If the treatment main effect is
significant, then we reject the null hypothesis.

The Error Term


The sums of squares for the within cells error term is the amount of error in the gain
scores. Recall that in an analysis of variance the sums of squares for error is defined
as
SSerror = S(Xij - M.j)�
That is, SSerror is the sum of the squared differences between a score and the group
mean for that score. In a gain score analysis Xij is the observed gain score and M.j is the
mean gain score for a particular treatment group. Error will be small to the extent that
the effect of the treatment is the same for each individual (i.e., the gain score is the
same for each person). The error term will be relatively large when the effect of
treatment is not the same for each person. In treatment outcome studies it is unlikely
that the treatment effect will be exactly the same for every individual.
The correlation between pretest and posttest scores within the treatment group provides
an estimate of the consistency of the treatment effect across individuals. If the pretest-
posttest score correlation is high, then the rank ordering of people on the posttest is
similar to rank ordering of people on the posttest and the effect of treatment is similar for
every individual. In this instance the error term will be relatively small. If the pretest-
posttest correlation is low, then the rank ordering of people on the posttest is not the
same as the rank ordering of people on the pretest, the effect of treatment is not the
same for each individual, and the error term will be relatively large.

Running the Analysis


The SPSS syntax Table 1. SPSS Syntax for the ANOVA
commands for running
the ANOVA are shown UNIANOVA
in Table 1. The first gain BY treatgrp
line (UNIANOVA) tells /EMMEANS = TABLES(treatgrp).
SPSS to run a
univariate analysis of
variance. The second line (gain...) defines the dependent variable (gain) and the
independent variable (treatgrp). The third line (/EMMEANS ...) will print the means,
standard errors and the 95% confidence intervals for the means.

The Results
The abbreviated Table 2. Tests of Between-Subjects Effects
analysis of variance Dependent Variable: COMPUTE gain = tanxpost - tanxpre
output is shown in
Table 2. The
means, standard Source Type III df Mean F Sig.
errors, and 95% Sum of Squares Square
confidence intervals
for each mean are
shown in Table TREATGRP 4010.641 1 4010.641 47.140 .000
3. The results can
be summarized as
follows: 6466.038 76 85.079
Error
Trait anxiety gain
scores (posttest -
pretest) were Total 16705.000 78
analyzed in an
analysis of
variance with Table 3. Means, Standard errors, and 95% Confidence Interval for
treatment group the Two Treatment Conditions
(treatment vs. Dependent Variable: COMPUTE gain = tanxpost - tanxpre
control) as the
independent
variable. The Mean Std. Error 95%
decrease in trait Confidence Interval
anxiety was
greater for
participants in the Condition Lower Upper
treatment Bound Bound
condition (M = -
15.93, SE = 1.46) Treatment -15.925 1.458 -18.830 -13.020
than for those in
the control
condition (M = - Control -1.579 1.496 -4.559 1.401
1.579, SE =
1.50), F (1, 76) =
47.14, p < .0005.

Interpretation of the 95% Confidence Interval


The 95% confidence intervals provide additional information about the effectiveness of
the two conditions. Because the gain score is computed as a difference score, no
change between pretest and posttest would be indicated by a gain score of zero. If the
95% confidence interval includes zero, then the gain score mean is not significantly
different from zero.
The 95% confidence interval for the treatment group mean ranges from -18.83 to -
13.02. It does not include zero so the mean gain is different from zero. That is, there
was significant improvement for participants in the treatment group. The 95%
confidence interval for the control group mean ranges from -4.56 to 1.40. It does
include zero to the mean gain is not different from zero. That is, there was no
significant improvement for participants in the control group. This information could be
added to the description of the results:
Trait anxiety gain scores (posttest - pretest) were analyzed in an analysis of
variance with treatment group (treatment vs. control) as the independent
variable. The decrease in trait anxiety was greater for participants in the
treatment condition (M = -15.93, SE = 1.46) than for those in the control condition
(M = -1.579, SE = 1.50), F (1, 76) = 47.14, p < .0005. Inspection of the 95%
confidence intervals around each mean indicated that there was a significant
decrease in anxiety for participants in the treatment condition, and no decrease
in anxiety for participants in the control condition.

III. Repeated Measures Analysis of Variance


An alternative procedure for analyzing the pretest and posttest scores is run a 2 x 2
ANOVA with time (pretest vs. posttest) as a within-subjects factor and treatment
(treatment vs. control) as a between subjects factor.
The SPSS syntax Table 4. SPSS Syntax for the ANOVA
commands for running
the 2 x 2 ANOVA are GLM
shown in Table tanxpre tanxpost BY treatgrp
4. The first line (GLM) /WSFACTOR = time 2 Repeated
tells SPSS to run the /PLOT = PROFILE( time*treatgrp )
General Linear Model /EMMEANS =
(GLM) procedure. The TABLES(treatgrp*time) compare(time) ADj(Bonferroni)
second line (tanxpre ... /WSDESIGN = time
) defines the two /DESIGN = treatgrp .
dependent measures
(the pretest score,
tanxpre, and the posttest score, tanxpost) and the independent variable (treatgrp). The
third line (/WSFACTOR... ) tells the GLM procedure that the two dependent measures
should be treated as a within subject factor. The fourth line (/PLOT...) will create a
graphic plot of the means, such as the one shown in Figure 1. The fifth (and sixth) line
(/EMEANS...) will print the treatgrp by time interaction means
[TABLES(treatgrp*time], and run a simple main effects analysis of the effects of time
within each treatment group [compare(time)] using a Bonferroni correction when testing
the mean differences [ADJ(Bonferroni)]. The seventh line (/WSDESIGN) specifies the
within subject factor, time.. The last line (/DESIGN) specifies the between subject
factor, treat grp.

Overall Analysis
The primary output from Table 5. Tests of Within-Subjects Effects
the analysis of variance
is divided into two parts
tables, the within Source Type III df Mean F Sig.
subject effects, see Sum of Square
Table 5, and the Squares
between subjects
effects, see Table 6.
The output has been 2985.321 1 2985.321 70.177 .000
TIME
abbreviated somewhat
for the purposes of this
discussion. 2005.321 1 2005.321 47.140 .000
TIME *
As shown in Table 5, TREATGRP
the interaction between
treatment and time is
significant, F (1, 76) = Error(TIME) 3233.019 76 42.540
47.14, p < .0005. The
interaction will be
interpreted with simple
main effects analysis Table 6. Tests of Between-Subjects Effects
looking at the effects of
time within each
treatment. The Source Type III df Mean Square F Sig.
significant time main Sum of
effect, F (1, 76) = 70.18, Squares
p < .0005 must be
interpreted in light of the
interaction effect. As TREATGRP 19.206 1 19.206 .135 .714
shown in Table 6, the
main effect for
treatment was not Error 10800.323 76 142.110
significant, F (1, 76) =
0.14, p = .714.
Conceptually, the interaction term in this 2 x 2 ANOVA can be thought of as a
comparison of the changes from pretest to posttest within each treatment group (see
the formula below). If the changes from pretest to posttest are identical in each group,
e.g., if the improvement is the same for each group, then there is no interaction. If the
change from pretest to posttest is greater in one group than the other group, e.g., if one
group improves more than the other group, then there is an interaction. An interaction
could also occur if one group improved from pretest to posttest while the other group
deteriorated.

Simple Main Effects


The interaction means, standard errors, and 95% Confidence Intervals for the means
are shown in Table 7. The simple main effects of time within each treatment condition
are shown in Table 8.
The interaction can be described by the following two statements:
The trait anxiety scores for participants in the treatment condition decreased from the
pretest (M= 54.35, SE = 1.95) to the posttest (M = 38.43, SE = 2.09), F (1, 76) =
119.23, p< .0005. The trait anxiety scores for participants in the control condition
showed no change from the pretest (M= 46.18, SE = 1.00) to the posttest (M =
44.60, SE = 2.15), F (1, 76) = 1.11, p = .295.
This interpretation involves showing that the change in scores from the pretest to the
posttest was greater for one group than for the other. It seems to me it is an
interpretation that is closely related to how people think about treatment outcome
studies. In general, we want to know if one treatment produced a greater effect than
another treatment.
Note: You need to be careful when you interpret the 95% Confidence Interval
information in SPSS output. The 95% Confidence Intervals shown in Table 7 are based
on the standard deviations of the individual means. It is appropriate for making
comparisons of between subjects means (e.g., the treatment pretest mean vs. the
control pretest mean), but it is too conservative for comparing the within subject means
(e.g., the treatment pretest mean vs. the treatment posttest mean).

IV. Discussion
Alternative explanations
Both the gain score analysis and the repeated measures analysis ignore the
(significant) pretest differences on trait anxiety. Can you think of any alternative
explanations to this outcome that are based on the existing pretest differences? For
example, can the regression towards the mean effect account for the pattern of results?

Comparison of the gain score results with the time by


treatment ANOVA results
The F-test value of the treatment main effect in the gain score analysis, F (1, 76) =
47.14, p < .0005, was the same as the F-test value for the time by treatment interaction
in the repeated measures analysis, F (1, 76) = 47.14, p < .0005. Why is this so?
Consider the following description of the time by treatment interaction term -
Time by treatment interaction = (treatment posttest - treatment pretest) - (control
posttest - control pretest)

The interaction is a comparison of the differences between the posttest and pretest
scores in each treatment group. As we noted earlier, if the difference is the same in
each treatment group, there is no interaction. If the difference is not the same in each
treatment group, then there is an interaction. Most computer programs such as SPSS
handle the within subjects factor, e.g., time, by literally creating a difference score for
each person by subtracting the posttest score from the pretest score. The test of the
main effect of time is a test of whether the overall mean difference score (across both
treatment groups) is different from zero. The test of the interaction is a test of whether
the mean difference score for the treatment group is different from the mean difference
score for the control group. In the gain score analysis we first computed the difference
between the posttest and pretest scores and then tested whether the differences were
the same for each treatment group. Thus the treatment main effect in the gain score
analysis is the same as the time by treatment interaction in the 2 x 2 ANOVA.
The interaction term in the ANOVA was significant. The details of the interaction were
analyzed using a simple main effects analysis of the effects of time within each
treatment condition. The simple main effects analysis indicated a significant change
from pretest to posttest in the treatment condition, but not in the control
condition. Similarly, the treatment main effect in the gain score analysis was
significant. The details of the main effect were analyzed using the 95% confidence
intervals for each of the group means. The 95% confidence interval analysis indicated a
significant change from pretest to posttest in the treatment condition, but not in the
control condition.
Technical note. You may have noted that although the F values for the gain score main
effect and ANOVA interaction effect are the same, the sums of squares are not the
same. This is due to the way in which SPSS creates the difference scores. Think of
creating the difference score by multiplying the individual scores by a coefficient (or
weight) called "c" -
Gain = c1*posttest + c2*pretest
When we computed the gain scores c1was set to +1 and c2was set to -1, that is, we
simply subtracted the pretest score from the posttest score -
Gain = (+1)*posttest + (-1)*pretest
SPSS "orthonormalizes" the coefficients so that the sum of the squares of the
coefficients is equal to 1.00. The coefficients used by SPSS are as follows -
Gain = (+0.707107)*posttest + (-0.707107)*pretest
If you square each of the coefficients (0.707107� = .5000) and sum them the result is
1.00.
You could check this out for yourself by using the SPSS coefficients to manually create
the gain score and then run the gain score analysis. You would find that both the sums
of squares and the F value from the gain score analysis would equal the sums and
squares and F value from the interaction term in the ANOVA.

Comparison of Gain Scores and Analysis of Covariance


The focus of the difference or gain score analysis is somewhat different from the focus
of the analysis of covariance. The gain score analysis focuses on the change that
occurs from the pretest to the posttest. By analyzing the change scores within each
group you can specify whether both groups improved at different rates, whether one
group improved while the other group showed no improvement, or even whether one
group improved while the other group deteriorated. The analysis of gain scores makes
no assumption about the equivalence of the pretest-posttest regression line. The
interpretation of the gain score analysis becomes somewhat problematic when there are
pretest differences.
The analysis of covariance focuses on the posttest differences between the treatment
groups while holding constant any differences in the pretest scores. But the analysis of
covariance does not tell you anything about the how the groups changed from pretest to
posttest. If you have met the assumptions of the analysis of covariance, then it is
generally considered to be a statistically more powerful analysis than a difference or
gain score analysis.
I have recently seen some studies that reported both the difference score analysis and
the analysis of covariance. The paper made the argument that the effects seen in the
study were robust because both analysis came to the same conclusion.

Additional information on gain score analyses


There is an extensive literature on the analysis of difference or gain scores. It has been
argued that difference or gain scores are inherently unreliable. The reference section
cites additional reading for anyone who might be interested.

V. References
Cattell, R. B. (1983). The clinical use of difference scores: Some psychometric
problems. Multivariate Experimental Clinical Research, 6, 87-98.
Gardner, R. C. (1987). Use of the simple change score in correlational
analysis. Educational and Psychological Measurement, 47, 849-864.
Humphreys, L. G. (1989). Some comments on the relationship between reliability and
statistical power. Applied Psychological Measurement, 13, 419-425.
Karabinus, R. A. (1983). The use of ANOVA, multiple regression, repeated ANOVA,
and effect size. Evaluation Review, 7, 841-850.
Lord, F. M. (1956). The measurement of growth. Educational and Psychological
Measurement, 16, 421-437.
Lord, F. M. (1963). Elementary models for measuring change. In C. W. Harris
(Ed.), Problems in measuring change. Madison, WI: University of Wisconsin Press.
Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference
score in the measurement of change. Journal of Educational Measurement, 20, 335-
343.
Stemmler, G. (1987). Implicit measurement models in methods for scoring
physiological reactivity. Journal of Psychophysiology, 1, 113-125.
Williams, R. H., Zimmerman, D. W., Rich, J. M., & Steed, J. L. (1984). An empirical
study of the relative error magnitude in three measures of change. Journal of
Experimental Education, 53, 55-57.=

1999, Lee A. Becker -revised 03/21/00

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy