Effect Size PDF
Effect Size PDF
Psychological
Research
Quantitative
Psychological
Research
THE COMPLETE STUDENT’S
COMPANION, 3rd EDITION
David Clark-Carter
Psychology Department,
Staffordshire University
Contents
Detailed contents of chapters ix
Preface xiv
Part 1
Introduction 1
1 The methods used in psychological research 3
Part 2
Choice of topic, measures and research design 19
2 The preliminary stages of research 21
3 Variables and the validity of research designs 37
4 Research designs and their internal validity 49
Part 3
Methods 69
5 Asking questions I: Interviews and surveys 71
6 Asking questions II: Measuring attitudes and meaning 86
7 Observation and content analysis 98
Part 4
Data and analysis 107
8 Scales of measurement 109
9 Summarising and describing data 116
10 Going beyond description 142
11 Samples and populations 151
12 Analysis of differences between a single sample and a population 161
13 Effect size and power 179
14 Parametric and non-parametric tests 187
15 Analysis of differences between two levels of an independent variable 197
16 Preliminary analysis of designs with one independent variable with more than
two levels 221
17 Analysis of designs with more than one independent variable 243
18 Subsequent analysis after ANOVA or χ2 259
19 Analysis of relationships I: Correlation 284
vii
viii Contents
Part 5
Sharing the results 389
25 Reporting research 391
Appendixes
I. Descriptive statistics 413
(linked to Chapter 9)
II. Sampling and confidence intervals for proportions 423
(linked to Chapter 11)
III. Comparing a sample with a population 428
(linked to Chapter 12)
IV. The power of a one-group z-test 434
(linked to Chapter 13)
V. Data transformation and goodness-of-fit tests 437
(linked to Chapter 14)
VI. Seeking differences between two levels of an independent variable 444
(linked to Chapter 15)
VII. Seeking differences between more than two levels of an independent variable 468
(linked to Chapter 16)
VIII. Analysis of designs with more than one independent variable 490
(linked to Chapter 17)
IX. Subsequent analysis after ANOVA or χ2 505
(linked to Chapter 18)
X. Correlation and reliability 522
(linked to Chapter 19)
XI. Regression 541
(linked to Chapter 20)
XII. ANCOVA 558
(linked to Chapter 21)
XIII. Evaluation of measures: Item and discriminative analysis, and accuracy of tests 560
(linked to Chapter 6)
XIV. Meta-analysis 564
(linked to Chapter 24)
XV. Probability tables 577
XVI. Power tables 617
XVII. Miscellaneous tables 661
References 671
Effect size
To allow the results of studies to be compared we need a measure which is
independent of sample size. Effect sizes provide such a measure. In future
chapters appropriate measures of effect size will be introduced for each
research design. In this chapter I will deal with the designs described in the
previous chapter, where a mean of a set of scores is being compared with a
population mean, or a proportion from a sample is compared with a propor-
tion in a population. A number of different versions exist for some effect size
measures. In general I am going to use the measures suggested by Cohen
(1988).
Statistical power
Statistical power is defined as the probability of avoiding a Type II error. The
probability of making a Type II error is usually symbolised by β (the Greek
letter beta). Therefore, the power of a test is 1 − β.
Figure 13.1 represents the situation where two means are being com-
pared: for example, the mean IQ for the population on which a test has been
standardised (µ1) and the mean for the population of people given special
training to enhance their IQs (µ2). Formally stated, H0 is µ2 = µ1, while the
research hypothesis (HA) is µ2 > µ1. As usual an α-level is set (say, α = .05).
This determines the critical mean, which is the mean IQ, for a given sample
size, which would be just large enough to allow us to reject H0. It determines
β, which will be the area (in the distribution which is centred on µ2) to the left
of the critical mean. It also then determines the power (1 − β), which is the
area (in the distribution which is centred on µ2) lying to the right of the critical
mean.
The power we require for a given piece of research will depend on the
aims of the research. Thus, if it is particularly important that we avoid
making a Type II error we will aim for a level of power which is as near 1 as
possible. For example, if we were testing the effectiveness of a drug which
could save lives we would not want wrongly to reject the research hypothesis
that the drug was effective. However, as you will see, achieving such a level
182 Data and analysis
estimate of the effect size will be affected by the sample size; the larger the
sample in the pilot study, the more accurate the estimate of the population
value will be. Thirdly, particularly in cases of intervention studies, the
researchers could set a minimum effect size which would be useful. Thus,
clinical psychologists might want to reduce scores on a depression measure
by at least a certain amount, or health psychologists might want to increase
exercise by at least a given amount. A final way around the problem is to
decide beforehand what size of effect they wish to detect based on Cohen’s
classification of effects into small, medium and large. Researchers can decide
that even a small effect is important in the context of their particular study.
Alternatively, they can aim for the necessary power for detecting a medium
or even a large effect if this is appropriate for their research. It should be
emphasised that they are not saying that they know what effect size will be
found but only that this is the effect size that they would be willing to put the
effort in to detect as statistically significant.
I would only recommend this last approach if there is no other indication
of what effect size your research is likely to entail. Nonetheless, this approach
does at least allow you to do power calculations in the absence of any other
information on the likely effect size.
To aid the reader with this approach I have provided power tables in
Appendix XVI for each statistical test and as each test is introduced I will
explain the use of the appropriate table.
Table 13.1 An extract of the power table for a one-group z-test, one-tailed
probability, = .05 (* denotes that the power is over .995)
184 Data and analysis
statistical power which will be achieved for a given effect size if a given
sample size is used.
The table shows that for a one-group z-test with a medium effect size
(d = 0.5), a one-tailed test and an α-level of .05, to achieve power of .80, 25
participants are required.
The following examples show the effect which altering one of these vari-
ables at a time has on power. Although these examples are for the one-group
z-test, the power of all statistical tests will be similarly affected by changes in
sample size, effect size, the α-level and, where a one-tailed test is possible for
the given statistical test, the nature of the research hypothesis.
Table 13.2 An extract of a power table for one-group t-tests, one-tailed probability,
= .05 (* denotes that the power is over .995)
Summary
Effect size is a measure of the degree to which an independent variable is seen
to affect a dependent variable or the degree to which two or more variables
are related. As it is independent of the sample size it is useful for comparisons
between studies.
The more powerful a statistical test, the more likely it is that a Type II
error will be avoided. A major contributor to a test’s power is the sample size.
During the design stage researchers should conduct some form of power
analysis to decide on the optimum sample size for the study. If they fail to
achieve statistical significance, then they should calculate what sample size
would be required to achieve a reasonable level of statistical power for the
effect size they have found in their study.
This chapter has shown how to find statistical power using tables. How-
ever, computer programs exist for power analysis. These include G*Power,
which is available via the Internet (see Faul, Erdfelder, Lang, & Buchner,
2007), and SamplePower (Borenstein, Rothstein, & Cohen, 1997).
The next chapter discusses the distinction between two types of statistical
tests: parametric and non-parametric tests.