0% found this document useful (0 votes)
34 views13 pages

Effect Size PDF

1. The document discusses effect size and power in quantitative psychological research. It notes limitations of solely relying on statistical significance and introduces effect size and power as additional statistical concepts. 2. Effect size provides a measure of the magnitude of research results that is independent of sample size, allowing for easier comparison between studies. It discusses calculating effect size when comparing two means or proportions. 3. Statistical power refers to the likelihood of avoiding a Type II error (rejecting the research hypothesis when it is actually correct). The chapter introduces power as a way for researchers to make more informed decisions about sample sizes needed for their studies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Effect Size PDF

1. The document discusses effect size and power in quantitative psychological research. It notes limitations of solely relying on statistical significance and introduces effect size and power as additional statistical concepts. 2. Effect size provides a measure of the magnitude of research results that is independent of sample size, allowing for easier comparison between studies. It discusses calculating effect size when comparing two means or proportions. 3. Statistical power refers to the likelihood of avoiding a Type II error (rejecting the research hypothesis when it is actually correct). The chapter introduces power as a way for researchers to make more informed decisions about sample sizes needed for their studies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Quantitative

Psychological
Research
Quantitative
Psychological
Research
THE COMPLETE STUDENT’S
COMPANION, 3rd EDITION

David Clark-Carter
Psychology Department,
Staffordshire University
Contents
Detailed contents of chapters ix

Preface xiv

Part 1
Introduction 1
1 The methods used in psychological research 3

Part 2
Choice of topic, measures and research design 19
2 The preliminary stages of research 21
3 Variables and the validity of research designs 37
4 Research designs and their internal validity 49

Part 3
Methods 69
5 Asking questions I: Interviews and surveys 71
6 Asking questions II: Measuring attitudes and meaning 86
7 Observation and content analysis 98

Part 4
Data and analysis 107
8 Scales of measurement 109
9 Summarising and describing data 116
10 Going beyond description 142
11 Samples and populations 151
12 Analysis of differences between a single sample and a population 161
13 Effect size and power 179
14 Parametric and non-parametric tests 187
15 Analysis of differences between two levels of an independent variable 197
16 Preliminary analysis of designs with one independent variable with more than
two levels 221
17 Analysis of designs with more than one independent variable 243
18 Subsequent analysis after ANOVA or χ2 259
19 Analysis of relationships I: Correlation 284

vii
viii Contents

20 Analysis of relationships II: Regression 314


21 Analysis of covariance (ANCOVA) 339
22 Screening data 357
23 Multivariate analysis 364
24 Meta-analysis 377

Part 5
Sharing the results 389
25 Reporting research 391

Appendixes
I. Descriptive statistics 413
(linked to Chapter 9)
II. Sampling and confidence intervals for proportions 423
(linked to Chapter 11)
III. Comparing a sample with a population 428
(linked to Chapter 12)
IV. The power of a one-group z-test 434
(linked to Chapter 13)
V. Data transformation and goodness-of-fit tests 437
(linked to Chapter 14)
VI. Seeking differences between two levels of an independent variable 444
(linked to Chapter 15)
VII. Seeking differences between more than two levels of an independent variable 468
(linked to Chapter 16)
VIII. Analysis of designs with more than one independent variable 490
(linked to Chapter 17)
IX. Subsequent analysis after ANOVA or χ2 505
(linked to Chapter 18)
X. Correlation and reliability 522
(linked to Chapter 19)
XI. Regression 541
(linked to Chapter 20)
XII. ANCOVA 558
(linked to Chapter 21)
XIII. Evaluation of measures: Item and discriminative analysis, and accuracy of tests 560
(linked to Chapter 6)
XIV. Meta-analysis 564
(linked to Chapter 24)
XV. Probability tables 577
XVI. Power tables 617
XVII. Miscellaneous tables 661

References 671

Glossary of symbols 677

Author index 678

Subject index 680


EFFECT SIZE
AND POWER 13
Introduction
There has been a tendency for psychologists and other behavioural scientists
to concentrate on whether a result is statistically significant, to the exclusion
of any other statistical consideration (Clark-Carter, 1997; Cohen, 1962;
Sedlmeier and Gigerenzer, 1989). Early descriptions of the method of hypoth-
esis testing (e.g. Fisher, 1935) only involved the Null Hypothesis. This chapter
deals with the consequences of this approach and describes additional tech-
niques, which come from the ideas of Neyman and Pearson (1933), which can
enable researchers to make more informed decisions.

Limitations of statistical significance


testing
Concentration on statistical significance misses an important aspect of
inferential statistics—statistical significance is affected by sample size. This
has two consequences. Firstly, statistical probability cannot be used as a
measure of the magnitude of a result; two studies may produce very different
results, in terms of statistical significance, simply because they have
employed different sample sizes. Therefore, if only statistical significance is
reported, then results cannot be sensibly compared. Secondly, two studies
conducted in the same way in every respect except sample size may lead to
different conclusions. The one with the larger sample size may achieve a
statistically significant result while the other one does not. Thus, the
researchers in the first study will reject the Null Hypothesis of no effect while
the researchers in the smaller study will reject their research hypothesis.
Accordingly, the smaller the sample size, the more likely we are to commit a
Type II error—rejecting the research hypothesis when in fact it is correct.
Two new concepts will provide solutions to the two problems. Effect size
gives a measure of magnitude of a result which is independent of sample
size. Calculating the power of a statistical test helps researchers decide on the
likelihood that a Type II error will be avoided.
180 Data and analysis

Effect size
To allow the results of studies to be compared we need a measure which is
independent of sample size. Effect sizes provide such a measure. In future
chapters appropriate measures of effect size will be introduced for each
research design. In this chapter I will deal with the designs described in the
previous chapter, where a mean of a set of scores is being compared with a
population mean, or a proportion from a sample is compared with a propor-
tion in a population. A number of different versions exist for some effect size
measures. In general I am going to use the measures suggested by Cohen
(1988).

Comparing two means


In the case of the difference between two means we can use Cohen’s d as the
measure of effect size:
µ2 − µ1
d=
σ
where µ1 is the mean for one population, µ2 is the mean for the other popula-
tion and σ is the standard deviation for the population (explained below).
To make this less abstract, recall the example, used in the last chapter, in
which the IQs of children brought up in an institution are compared with the
IQs of children not reared in an institution. Then, µ1 is the mean IQ of the
population of children reared in institutions, µ2 is the mean for the population
1
The equation used to of children not reared in institutions and σ is the standard deviation of IQ
calculate effect size is scores, which is assumed to be the same for both groups. This assumption
independent of sample size.
However, as with any statistic will be explained in the next chapter but need not concern us here. Usually,
calculated from a sample, the we do not know the values of all the parameters which are needed to
larger the sample, the more calculate an effect size and so we use the equivalent sample statistics.
accurate the statistic will be
as an estimate of the value in Accordingly, d is a measure of how many standard deviations apart the
the population (the two means are. Note that although this is similar to the equations for calculat-
parameter). ing z, given in the last chapter, d fulfils our requirement for a measure which
is independent of the sample size.1
In the previous chapter we were told that, as usual, the mean for the
‘normal’ population’s IQ is 100; the standard deviation for the particular test
was 15 and the mean IQ for the institutionalised children was 90. Therefore,
90 − 100
d=
15
= −0.67
After surveying published research, Cohen has defined, for each effect size
measure, what constitutes a small effect, a medium effect and a large effect. In
the case of d, a d of 0.2 (meaning that the mean IQs of the groups are just
under ¼ of an SD apart) represents a small effect size, a d of 0.5 (½ an SD)
constitutes a medium effect size and a d of 0.8 (just over ¾ of an SD) would be
a large effect size (when evaluating the magnitude of an effect size, ignore
the negative sign). Thus, in this study we can say that being reared in an
institution has between a medium and a large effect on the IQs of children.
13. Effect size and power 181

An additional use of effect size is that it allows the results of a number of


related studies to be combined to see whether they produce a consistent
effect. This technique—meta-analysis—will be dealt with in Chapter 24.

Comparing a proportion from a sample with a


population proportion of .5
Cohen (1988) gives the effect size g for this situation, where g = p − π (where p
is the proportion in the sample and π is the proportion in the population). He
defines a g of 0.05 as a small effect, a g of 0.15 as a medium effect and a g of
0.25 as a large effect.

The importance of an effect size


As Rosnow and Rosenthal (1989) have pointed out, the importance of an
effect size will depend on the nature of the research being conducted. If a
study into the effectiveness of a drug at saving lives found only a small effect
size, even though the lives of only a small proportion of participants were
being saved, this would be an important effect. However, if the study was
into something trivial such as a technique for enhancing performance on a
computer game, then even a large effect might not be considered to be
important. Thus, Cohen’s guidelines for what constitute large, medium and
small effects can be useful to put a result in perspective, particularly in a new
area of research, but they should not be used slavishly without thought to the
context of the study from which an effect size has been derived.

Statistical power
Statistical power is defined as the probability of avoiding a Type II error. The
probability of making a Type II error is usually symbolised by β (the Greek
letter beta). Therefore, the power of a test is 1 − β.
Figure 13.1 represents the situation where two means are being com-
pared: for example, the mean IQ for the population on which a test has been
standardised (µ1) and the mean for the population of people given special
training to enhance their IQs (µ2). Formally stated, H0 is µ2 = µ1, while the
research hypothesis (HA) is µ2 > µ1. As usual an α-level is set (say, α = .05).
This determines the critical mean, which is the mean IQ, for a given sample
size, which would be just large enough to allow us to reject H0. It determines
β, which will be the area (in the distribution which is centred on µ2) to the left
of the critical mean. It also then determines the power (1 − β), which is the
area (in the distribution which is centred on µ2) lying to the right of the critical
mean.
The power we require for a given piece of research will depend on the
aims of the research. Thus, if it is particularly important that we avoid
making a Type II error we will aim for a level of power which is as near 1 as
possible. For example, if we were testing the effectiveness of a drug which
could save lives we would not want wrongly to reject the research hypothesis
that the drug was effective. However, as you will see, achieving such a level
182 Data and analysis

FIGURE 13.1 A graphical


representation of the links
between statistical power,
 and 

of power may involve an impractically large sample size. Therefore, Cohen


and others recommend, as a rule of thumb, that a reasonable minimum level
of power to aim for, under normal circumstances, is .8. In other words, the
probability of making a Type II error (β) is 1 − power = .2. With an α-level set
at .05 this will give us a ratio of the probabilities of committing a Type I and a
Type II error of 1:4. However, as was stated in Chapter 10, it is possible to set a
different level of α.
Statistical power depends on many factors, including the type of test
being employed, the effect size, the design—whether it is a between-subjects
or a within-subjects design—the α-level set, whether the test is one- or two-
tailed and, in the case of between-subjects designs, the relative size of the
samples.
Power analysis can be used in two ways. It can be used prospectively
during the design stage to decide on the sample size required to achieve a
given level of power. It can also be used retrospectively, once the data have
been collected, to ascertain what power the test had. The more useful
approach is prospective power analysis. Once the design, α-level and tail of
test have been decided, researchers can calculate the sample size they require.
However, they still have the problem of arriving at an indication of the effect
size before they can do the power calculations. But as the study has yet to be
conducted this is unknown.

Choosing the effect size prior to conducting a study


There are at least four ways in which effect size can be chosen before a study
is conducted. Firstly, researchers can look at previous research in the area to
get an impression of the size of effects which have been found. This would be
helped if researchers routinely reported the effect sizes they have found. The
APA’s publication manual (American Psychological Association, 2001)
recommends the inclusion of effect sizes in the report of research. Nonethe-
less, if the appropriate descriptive statistics have been reported (such as
means and SDs), then an effect size can be calculated. Secondly, in the
absence of such information, researchers can calculate an effect size from
the results of their pilot studies. However, as noted earlier, the accuracy of the
13. Effect size and power 183

estimate of the effect size will be affected by the sample size; the larger the
sample in the pilot study, the more accurate the estimate of the population
value will be. Thirdly, particularly in cases of intervention studies, the
researchers could set a minimum effect size which would be useful. Thus,
clinical psychologists might want to reduce scores on a depression measure
by at least a certain amount, or health psychologists might want to increase
exercise by at least a given amount. A final way around the problem is to
decide beforehand what size of effect they wish to detect based on Cohen’s
classification of effects into small, medium and large. Researchers can decide
that even a small effect is important in the context of their particular study.
Alternatively, they can aim for the necessary power for detecting a medium
or even a large effect if this is appropriate for their research. It should be
emphasised that they are not saying that they know what effect size will be
found but only that this is the effect size that they would be willing to put the
effort in to detect as statistically significant.
I would only recommend this last approach if there is no other indication
of what effect size your research is likely to entail. Nonetheless, this approach
does at least allow you to do power calculations in the absence of any other
information on the likely effect size.
To aid the reader with this approach I have provided power tables in
Appendix XVI for each statistical test and as each test is introduced I will
explain the use of the appropriate table.

The power of a one-group z-test to compare a


sample mean and population mean
Power analysis for this test is probably the simplest and for the interested
reader I have provided, in Appendix IV, a description of how to calculate the
exact power for the test and how to calculate the sample size needed for a
given level of power. Here I will describe how to use power tables to decide
sample size.
Table 13.1 shows part of the power table for a one-group z-test, from
Appendix XVI. The top row of the table shows effect sizes (d). The first
column shows the sample size. The figures in the body of the table are the

Table 13.1 An extract of the power table for a one-group z-test, one-tailed
probability,  = .05 (* denotes that the power is over .995)
184 Data and analysis

statistical power which will be achieved for a given effect size if a given
sample size is used.
The table shows that for a one-group z-test with a medium effect size
(d = 0.5), a one-tailed test and an α-level of .05, to achieve power of .80, 25
participants are required.
The following examples show the effect which altering one of these vari-
ables at a time has on power. Although these examples are for the one-group
z-test, the power of all statistical tests will be similarly affected by changes in
sample size, effect size, the α-level and, where a one-tailed test is possible for
the given statistical test, the nature of the research hypothesis.

Sample size and power


Increased sample size produces greater power. If everything else is held con-
stant but we use 40 participants, then power rises to .94.

Effect size and power


The larger the effect size the greater the power. With an effect size of 0.7,
power rises to .97 for 25 participants with a one-tailed α-level of .05.

Research hypothesis and power


A one-tailed test is more powerful than a two-tailed test. A two-tailed test
using 25 people for an effect size of d = 0.5 would have given power of .71 (see
Appendix XVI), whereas the one-tailed version gave power of .8.

-Level and power


The smaller the α-level, the lower is the power. In other words, if everything
else is held constant, then reducing the likelihood of making a Type I error
increases the likelihood of making a Type II error. Setting α at .01 reduces
power from .8 to .57. On the other hand, setting α at .1 increases power to
nearly .99. These effects can be seen in Figure 13.1; as α gets smaller (the
critical mean moves to the right), 1 − β gets smaller, and as α gets larger (the
critical mean moves to the left), 1 − β gets larger.

The power of a one-group t-test


To assess the power of a one-group t-test or to decide on the sample size
necessary to achieve a desired level of power, use the table provided in
Appendix XVI, part of which is reproduced in Table 13.2. The tables for a one-
group t-test can be read in the same way as those for the one-group z-test. For
example, imagine that researchers wished to detect a small effect size (d = 0.2)
and have power of .8. They would need to have between 150 and 160 partici-
pants in their study. Therefore, as .80 lies midway between .79 and .81, we
can say that the sample would need to be 155 (midway between 150 and 160).
13. Effect size and power 185

Table 13.2 An extract of a power table for one-group t-tests, one-tailed probability,
 = .05 (* denotes that the power is over .995)

The power of the z-test to compare a proportion


from a sample with a proportion of .5 in the
population
In Chapter 12 an example was given of researchers wishing to compare the
proportion of smokers in a sample taken after a ban on smoking in public
places (.45) with the proportion in the population who smoked prior to the
ban (.5). Using the effect size g (the difference between the two proportions)
of .05, we can use Table A16.2 in Appendix XVI, and find that if the
researchers had a directional hypothesis and hence were using a one-tailed
test, with an α-level of .05, then they would need over 600 participants to give
their test power of .8.

Prospective power analysis after a study


If a study fails to support the research hypothesis, there are two possible
explanations. The one that is usually assumed is that the hypothesis was in
some way incorrect. However, an alternative explanation is that the test had
insufficient power to achieve statistical significance. If statistical significance
is not achieved I recommend that researchers calculate the sample size which
would be necessary, for the effect size they have found in their study, to
achieve power of .8.
Sometimes researchers, particularly students, state that had they used
more participants they might have achieved a statistically significant result.
This is not a very useful statement, as it will almost always be true if a big
enough sample is employed, however small the effect size. For example, if a
one-group t-test was being used, with α = .05 and the effect size was as small
as d = 0.03, a sample size of approximately 10,000 would give power of .8 for a
one-tailed test. This effect size is achieved if the sample mean is only one-
thirtieth of a standard deviation from the population mean—a difference of
half an IQ point if the sample SD is 15 IQ points.
186 Data and analysis

It is far more useful to specify the number of participants which would be


required to achieve power of .8. This would put the results in perspective. If
the effect size is particularly small and the sample size required is vast, then it
questions the value of trying to replicate the study as it stands, whereas if the
sample size were reasonable, then it could be worth replicating the study.
As a demonstration, imagine that researchers conducted a study with 50
participants. They analysed their data using a one-group t-test, with a one-
tailed probability and α-level of .05. The probability of their result having
occurred if the Null Hypothesis was true was greater than .05 and so they had
insufficient information to reject the Null Hypothesis. When they calculated
the effect size, it was found to be d = 0.1. They then went on to calculate the
power of the test and found that it was .17. In other words, the probability of
committing a Type II error was 1 − .17 = .83. Therefore, there was an 83%
chance that they would reject their research hypothesis when it was true.
They were hardly giving it a fair chance. Referring to Table 13.2 again, we can
see that over 600 participants would be needed to give the test power of .8.
The need for such a large sample should make researchers think twice before
attempting a replication of the study. If they wished to test the same hypoth-
esis, they might examine the efficiency of their design to see whether they
could reduce the overall variability of the data.
As a second example, imagine that researchers used 25 participants in a
study but found after analysis of the data that the one-tailed, one-group t-test
was not statistically significant at the .05 level. The effect size was found to be
d = 0.4. The test, therefore, only had power of .61. In order to achieve the
desired power of .8, 40 participants would have to be used. In this example
the effect size is between a small and a medium one and as a sample size of 40
is not unreasonable, it would be worth replicating the study with the
enlarged sample.

Summary
Effect size is a measure of the degree to which an independent variable is seen
to affect a dependent variable or the degree to which two or more variables
are related. As it is independent of the sample size it is useful for comparisons
between studies.
The more powerful a statistical test, the more likely it is that a Type II
error will be avoided. A major contributor to a test’s power is the sample size.
During the design stage researchers should conduct some form of power
analysis to decide on the optimum sample size for the study. If they fail to
achieve statistical significance, then they should calculate what sample size
would be required to achieve a reasonable level of statistical power for the
effect size they have found in their study.
This chapter has shown how to find statistical power using tables. How-
ever, computer programs exist for power analysis. These include G*Power,
which is available via the Internet (see Faul, Erdfelder, Lang, & Buchner,
2007), and SamplePower (Borenstein, Rothstein, & Cohen, 1997).
The next chapter discusses the distinction between two types of statistical
tests: parametric and non-parametric tests.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy