Effectsize
Effectsize
1
COHEN’S D (HEDGE’S G) COHEN’S D
¢ Cohen was one of the pioneers in advocating ¢ Note the similarity to a z-score- we’re talking
effect size over statistical significance about a standardized difference
¢ The mean difference itself is a measure of effect
size, however taking into account the variability,
we obtain a standardized measure for comparison
¢ Defined d for the one-sample case
of studies across samples such that e.g. a d =.20
in this study means the same as that reported in
another study
X −µ
d=
s
COHEN’S D – DIFFERENCES
COHEN’S D BETWEEN MEANS
¢ Now compare to the one-sample t-statistic ¢ Standard measure for independent samples t test
X − µX
t=
s X1 − X 2
N
d=
¢ So t sp
t = d N and d =
N
¢ This shows how the test statistic (and its observed p- ¢ Cohen initially suggested could use either sample
value) is in part determined by the effect size, but is standard deviation, since they should both be equal
confounded with sample size according to our assumptions (homogeneity of
¢ This means small effects may be statistically variance)
significant in many studies (esp. social sciences) In practice however researchers use the pooled variance
EXAMPLE EXAMPLE
¢ Average number of times graduate ¢ Find the pooled variance and sd
Equal groups so just average the two variances such
psych students curse in the presence
that and sp2 = 6.25
of others out of total frustration over
the course of a day
13 − 11
¢ Currently taking a statistics course d= = .8
vs. not X s = 13 s 2 = 7.5 n = 30 6.25
¢ Data: X n = 11 s 2 = 5.0 n = 30
2
COHEN’S D – DIFFERENCES
BETWEEN MEANS GLASS’S Δ
¢ Relationship to t ¢ For studies with control groups, we’ll use the
1 1 control group standard deviation in our formula
d =t +
n1 n2
Relationship to rpb X1 − X 2
¢
d=
scontrol
¢ This does not assume equal variances
⎛ n + n2 − 2 ⎞ ⎛ 1 1 ⎞ d
d = rpb ⎜ 1 +
⎜ 1 − r 2 ⎟⎟ ⎜⎝ n1 n2 ⎟⎠
r=
2
⎝ pb ⎠ d + (1/ pq )
P and q are the proportions of the total each group makes up.
If equal groups p=.5, q=.5 and the denominator is d2 + 4 as you will see
in some texts
3
DEPENDENT SAMPLES DEPENDENT SAMPLES
¢ Another option is to use standardizer in the
¢ The standard deviation of the difference scores, unlike the metric of the original scores, which is directly
previous solution, takes into account the correlated nature of the comparable with a standardized mean difference
data from an independent-samples design
Var1 + Var2 – 2covar
D
D D d=
d= =
sD s p 2(1 − r12 )
sp
¢ In pre-post types of situations where one would
not expect homogeneity of variance, treat the
¢ Problems remain however pretest group of scores as you would the control
¢ A standardized mean change in the metric of the difference scores for Glass’s Δ
can be much different than the metric of the original scores
Variability of difference scores might be markedly different for
change scores compared to original units
¢ Interpretation may not be straightforward
4
ASSOCIATION ANOTHER MEASURE OF EFFECT SIZE
¢ A measure of association describes the ¢ The point-biserial correlation, rpb, is the
amount of the covariation between the Pearson correlation between membership
independent and dependent variables
¢ It is expressed in an unsquared
in one of two groups and a continuous
standardized metric or its squared value outcome variable
—the former is usually a correlation*, the ¢ As mentioned rpb has a direct relationship
latter a variance-accounted-for effect size to t and d
¢ A squared multiple correlation (R2)
calculated in ANOVA is called the ¢ When squared it is a special case of eta-
correlation ratio or estimated eta-squared squared in ANOVA
(η2) An one-way ANOVA for a two-group factor:
eta-squared = R2 from a regression approach
= r2pb
ETA-SQUARED ETA-SQUARED
¢ A measure of the degree to which variability ¢ Relationship to t in the two group setting
among observations can be attributed to
conditions
¢ Example: η2 = .50 t2
50% of the variability seen in the scores is due to the
η2 = 2
independent variable.
t + df
SStreat
η2 = 2
= R pb
SStotal
5
COHEN’S F RELATION TO PES
¢ Cohen has a d type of measuere for Anova called Using Partial Eta-Squared
f
∑ ( X − X .. )2
f = k PES
MSe f =
1 − PES
¢ Cohen's f is interpreted as how many standard
deviation units the means are from the grand
mean, on average, or, if all the values were
standardized, f is the standard deviation of those
standardized means
χ2 χ2
C= 2
χ +N φ=
N
6
CRAMER’S PHI ODDS RATIOS
¢ Especially good for 2X2 tables
¢ Take a ratio of two outcomes
χ2 ¢ Although neither gets the majority, we could
say which they were more likely to vote for
φc = respectively
N (k − 1) ¢ Odds Clinton among Dems= 564/636 = .887
¢ Odds McCain among Reps= 450/550 = .818
¢ .887/.818 (the odds ratio) means they’d be Yes No Total
1.08 times as likely to vote Clinton among
democrats than McCain among republicans Clinton 564 636 1200
¢ Again think of it as a measure of association from
¢ However, the 95% CI for the odds ratio is: McCain 450 550 1000
0 (weak) to 1 (strong), that is phi for 2X2 tables .92 to 1.28
but also works for more complex ones. ¢ This suggests it would not be wise to predict
either has a better chance at nomination at
¢ k is the lesser of the number of rows or columns this point.
7
CONFIDENCE INTERVALS FOR
OTHER CASE-LEVEL EFFECT SIZES EFFECT SIZE
¢ Common language effect size (McGraw & ¢ Effectsize statistics such as Hedge’s
Wong, 1992) is the predicted probability
that a random score from the upper group g and η2 have complex distributions
exceeds a random score from the lower ¢ Traditional methods of interval
group
estimation rely on approximate
0 − ( X1 − X 2 ) standard errors assuming large
zCL =
s12 + s22 sample sizes
¢ General form for d
d/g =
d2 2(1 − r ) n1 + n2
2(n − 1)
+
n d pop = ncp
n1n2
8
LIMITATIONS OF EFFECT SIZE LIMITATIONS OF EFFECT SIZE
MEASURES MEASURES
¢ How to fool yourself with effect size estimation: ¢ 6. Believe that finding large effects somehow lessens the need for
replication
¢ 1. Examine effect size only at the group level
¢ 7. Forget that effect sizes are subject to sampling error
¢ 2. Apply generic definitions of effect size magnitude ¢ 8. Forget that effect sizes for fixed factors are specific to the
without first looking to the literature in your area particular levels selected for study
¢ 3. Believe that an effect size judged as “large” according to ¢ 9. Forget that standardized effect sizes encapsulate other
generic definitions must be an important result and that a quantities such as the unstandardized effect size, error variance,
“small” effect is unimportant (see Prentice & Miller, 1992) and experimental design
¢ 4. Ignore the question of how theoretical or practical ¢ 10. As a journal editor or reviewer, substitute effect size
significance should be gauged in your research area magnitude for statistical significance as a criterion for whether a
work is published
¢ 5. Estimate effect size only for statistically significant ¢ 11. Think that effect size = cause size
results
RECOMMENDATIONS RECOMMENDATIONS
¢ First recall APA task force suggestions ¢ Report and interpret effect sizes in the context of
Report effect sizes those seen in previous research rather than rules
of thumb
Report confidence intervals
¢ Report and interpret confidence intervals (for
Use graphics
effect sizes too) also within the context of prior
research
In other words don’t be overly concerned with whether
a CI for a mean difference doesn’t contain zero but
where it matches up with previous CIs
¢ Summarize prior and current research with the
display of CIs in graphical form (e.g. w/ Tryon’s
reduction)
¢ Report effect sizes even for nonsig results
RESOURCES
¢ Kline, R. (2004) Beyond significance
testing.
Much of the material for this lecture came
from this
¢ Rosnow, R & Rosenthal, R. (2003). Effect
Sizes for Experimenting Psychologists.
Canadian JEP 57(3).
¢ Thompson, B. (2002). What future
Quantitative Social Science Research
could look like: Confidence intervals for
effect sizes. Educational Researcher.