0% found this document useful (0 votes)
26 views9 pages

Effectsize

The document discusses the concepts of statistical significance and effect size, emphasizing that many researchers misunderstand the meaning of p-values and the importance of effect sizes. It explains various methods for calculating effect sizes, including Cohen's d, eta-squared, and odds ratios, and highlights the need for researchers to consider the practical significance of their findings. Additionally, it provides guidelines for interpreting effect sizes and the relationship between different statistical measures.

Uploaded by

jiyajosevadakkel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views9 pages

Effectsize

The document discusses the concepts of statistical significance and effect size, emphasizing that many researchers misunderstand the meaning of p-values and the importance of effect sizes. It explains various methods for calculating effect sizes, including Cohen's d, eta-squared, and odds ratios, and highlights the need for researchers to consider the practical significance of their findings. Additionally, it provides guidelines for interpreting effect sizes and the relationship between different statistical measures.

Uploaded by

jiyajosevadakkel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

STATISTICAL SIGNIFICANCE

¢ Turns out a lot of researchers do not know


what precisely p < .05 actually means
— Cohen (1994) Article: The earth is round (p<.
05)
¢ What it means: "Given that H0 is true,
what is the probability of these (or more
extreme) data?”
EFFECT SIZE
¢ Trouble is most people want to know
More to life than statistical significance
Reporting effect size "Given these data, what is the probability
that H0 is true?"

ALWAYS A DIFFERENCE WHAT SHOULD WE BE DOING?


¢ With most analyses we commonly define the null ¢ We want to make sure we have looked hard
hypothesis as ‘no relationship’ between our enough for the difference – power analysis
predictor and outcome(i.e. the ‘nil’ hypothesis) ¢ Figure out how big the thing we are looking for is
¢ With sample data, differences between groups – effect size
always exist (at some level of precision),
correlations are always non-zero.
¢ Obtaining statistical significance can be seen as
just a matter of sample size
¢ Furthermore, the importance and magnitude of
an effect are not accurately reflected because of
the role of sample size in probability value
attained

CALCULATING EFFECT SIZE TYPES OF EFFECT SIZE


¢ Though different statistical tests have different ¢ Two basic classes of effect size
effect sizes developed for them, the general ¢ Focused on standardized mean differences for
principle is the same group comparisons
— Allows comparison across samples and variables with
differing variance
¢ Effect size refers to the magnitude of the impact ¢ Equivalent to z scores
of some variable on another — Note sometimes no need to standardize (units of the
scale have inherent meaning)
¢ Variance-accounted-for
— Amount explained versus the total
¢ d family vs. r family
¢ With group comparisons we will also talk about
case-level effect sizes

1
COHEN’S D (HEDGE’S G) COHEN’S D
¢ Cohen was one of the pioneers in advocating ¢ Note the similarity to a z-score- we’re talking
effect size over statistical significance about a standardized difference
¢ The mean difference itself is a measure of effect
size, however taking into account the variability,
we obtain a standardized measure for comparison
¢ Defined d for the one-sample case
of studies across samples such that e.g. a d =.20
in this study means the same as that reported in
another study
X −µ
d=
s

COHEN’S D – DIFFERENCES
COHEN’S D BETWEEN MEANS

¢ Now compare to the one-sample t-statistic ¢ Standard measure for independent samples t test
X − µX
t=
s X1 − X 2
N
d=
¢ So t sp
t = d N and d =
N
¢ This shows how the test statistic (and its observed p- ¢ Cohen initially suggested could use either sample
value) is in part determined by the effect size, but is standard deviation, since they should both be equal
confounded with sample size according to our assumptions (homogeneity of
¢ This means small effects may be statistically variance)
significant in many studies (esp. social sciences) — In practice however researchers use the pooled variance

EXAMPLE EXAMPLE
¢ Average number of times graduate ¢ Find the pooled variance and sd
Equal groups so just average the two variances such
psych students curse in the presence —
that and sp2 = 6.25
of others out of total frustration over
the course of a day

13 − 11
¢ Currently taking a statistics course d= = .8
vs. not X s = 13 s 2 = 7.5 n = 30 6.25
¢ Data: X n = 11 s 2 = 5.0 n = 30

2
COHEN’S D – DIFFERENCES
BETWEEN MEANS GLASS’S Δ
¢ Relationship to t ¢ For studies with control groups, we’ll use the
1 1 control group standard deviation in our formula
d =t +
n1 n2
Relationship to rpb X1 − X 2
¢
d=
scontrol
¢ This does not assume equal variances

⎛ n + n2 − 2 ⎞ ⎛ 1 1 ⎞ d
d = rpb ⎜ 1 +
⎜ 1 − r 2 ⎟⎟ ⎜⎝ n1 n2 ⎟⎠
r=
2
⎝ pb ⎠ d + (1/ pq )

P and q are the proportions of the total each group makes up.
If equal groups p=.5, q=.5 and the denominator is d2 + 4 as you will see
in some texts

COMPARISON OF METHODS DEPENDENT SAMPLES


¢ One option would be to simply do nothing
different than we would in the independent
samples case, and treat the two sets of scores as
independent
¢ Problem:
— Homogeneity of variance assumption may not be
tenable
— They aren’t independent

DEPENDENT SAMPLES DEPENDENT SAMPLES


¢ Another option is to obtain a metric with ¢ Difference scores
regard to the actual difference scores on ¢ Mean difference score divided by the standard
which the test is run deviation of the difference scores
¢ A d statistic for a dependent mean
contrast is called a standardized mean
change (gain)
¢ There are two general standardizers: D
d=
— A standard deviation in the metric of the sD
¢ 1. difference scores (D)
¢ 2. original scores

3
DEPENDENT SAMPLES DEPENDENT SAMPLES
¢ Another option is to use standardizer in the
¢ The standard deviation of the difference scores, unlike the metric of the original scores, which is directly
previous solution, takes into account the correlated nature of the comparable with a standardized mean difference
data from an independent-samples design
— Var1 + Var2 – 2covar
D
D D d=
d= =
sD s p 2(1 − r12 )
sp
¢ In pre-post types of situations where one would
not expect homogeneity of variance, treat the
¢ Problems remain however pretest group of scores as you would the control
¢ A standardized mean change in the metric of the difference scores for Glass’s Δ
can be much different than the metric of the original scores
— Variability of difference scores might be markedly different for
change scores compared to original units
¢ Interpretation may not be straightforward

DEPENDENT SAMPLES CHARACTERIZING EFFECT SIZE


¢ Cohen emphasized that the interpretation of
effects requires the researcher to consider things
• Base it on substantive theoretical
narrowly in terms of the specific area of inquiry
interest ¢ Evaluation of effect sizes inherently requires a
• If the emphasis is really on change, personal value judgment regarding the practical
i.e. the design is intrinsically or clinical importance of the effects
repeated measures, one might
Which to use? choose the option of standardized
mean change
• In other situations we might retain
the standardizer in the original
metric, such that the d will have
the same meaning as elsewhere

HOW BIG? SMALL, MEDIUM, LARGE?


¢ Cohen (e.g. 1969, 1988) offers some rules of thumb ¢ Cohen (1969)
— Fairly widespread convention now (unfortunately) ¢ ‘small’
¢ Looked at social science literature and suggested — real, but difficult to detect
some ways to carve results into small, medium, and — difference between the heights of 15 year old and 16 year
large effects old girls in the US
¢ Cohen’s d values (Lipsey 1990 ranges in — Some gender differences on aspects of Weschler Adult
parentheses) Intelligence scale
— 0.2 small (<.32) ¢ ‘medium’
— 0.5 medium (.33-.55) — ‘large enough to be visible to the naked eye’
— 0.8 large (.56-1.2) — difference between the heights of 14 & 18 year old girls
¢ Be wary of “mindlessly invoking” these criteria ¢ ‘large’
¢ The worst thing that we could do is subsitute d = .20 — ‘grossly perceptible and therefore large’
for p = .05, as it would be a practice just as lazy and — difference between the heights of 13 & 18 year old girls
fraught with potential for abuse as the decades of — IQ differences between PhDs and college freshman
poor practices we are currently trying to overcome

4
ASSOCIATION ANOTHER MEASURE OF EFFECT SIZE
¢ A measure of association describes the ¢ The point-biserial correlation, rpb, is the
amount of the covariation between the Pearson correlation between membership
independent and dependent variables
¢ It is expressed in an unsquared
in one of two groups and a continuous
standardized metric or its squared value outcome variable
—the former is usually a correlation*, the ¢ As mentioned rpb has a direct relationship
latter a variance-accounted-for effect size to t and d
¢ A squared multiple correlation (R2)
calculated in ANOVA is called the ¢ When squared it is a special case of eta-
correlation ratio or estimated eta-squared squared in ANOVA
(η2) — An one-way ANOVA for a two-group factor:
eta-squared = R2 from a regression approach
= r2pb

ETA-SQUARED ETA-SQUARED
¢ A measure of the degree to which variability ¢ Relationship to t in the two group setting
among observations can be attributed to
conditions

¢ Example: η2 = .50 t2
— 50% of the variability seen in the scores is due to the
η2 = 2
independent variable.
t + df

SStreat
η2 = 2
= R pb
SStotal

OMEGA-SQUARED PARTIAL ETA-SQUARED


¢ Another effect size measure that is less biased ¢ A measure of the degree to which variability among
and interpreted in the same way as eta-squared observations can be attributed to conditions controlling
for the subjects’ effect that’s unaccounted for by the
model (individual differences/error)

SStreat − (k − 1) MSerror SStreat


ω2 = partial η2 =
SStotal + MSerror SStreat + SSerror
¢ Rules of thumb for small medium large: .01, .06, .14
¢ Note that in one-way design SPSS labels this as PES but
is actually eta-squared, as there is only one factor and no
others to partial out

5
COHEN’S F RELATION TO PES
¢ Cohen has a d type of measuere for Anova called Using Partial Eta-Squared
f
∑ ( X − X .. )2
f = k PES
MSe f =
1 − PES
¢ Cohen's f is interpreted as how many standard
deviation units the means are from the grand
mean, on average, or, if all the values were
standardized, f is the standard deviation of those
standardized means

GUIDELINES OTHER EFFECT SIZE MEASURES


¢ As eta-squared values are basically values r2 ¢ Measures of association for non-
the feel for what is large, medium and small continuous data
is similar and depends on many contextual — Contingency coefficient
factors — Phi
¢ Small eta-squared and partial eta-square — Cramer’s Phi
values might not get the point across (i.e. look ¢ d-family
big enough to worry about) — Odds Ratios
— Might transform to Cohen’s f or use so as to ¢ Agreement
continue to speak of standardized mean differences
— Kappa
— His suggestions for f are: .10,.25,.40 which
translate to .01,.06, and .14 for eta-squared values ¢ Case level effect sizes
¢ That is something researchers could overcome
if they understood more about effect sizes

CONTINGENCY COEFFICIENT PHI

χ2 χ2
C= 2
χ +N φ=
N

¢ An approximation of the correlation between the


two variables (e.g. 0 to 1) ¢ Used in 2 X 2 tables as a correlation (0 to 1)
¢ Problem- can’t ever reach 1 and its max value is
¢ Problem- gets weird with more complex tables
dependent on the dimensions of the contingency
table

6
CRAMER’S PHI ODDS RATIOS
¢ Especially good for 2X2 tables
¢ Take a ratio of two outcomes
χ2 ¢ Although neither gets the majority, we could
say which they were more likely to vote for
φc = respectively
N (k − 1) ¢ Odds Clinton among Dems= 564/636 = .887
¢ Odds McCain among Reps= 450/550 = .818
¢ .887/.818 (the odds ratio) means they’d be Yes No Total
1.08 times as likely to vote Clinton among
democrats than McCain among republicans Clinton 564 636 1200
¢ Again think of it as a measure of association from
¢ However, the 95% CI for the odds ratio is: McCain 450 550 1000
0 (weak) to 1 (strong), that is phi for 2X2 tables — .92 to 1.28
but also works for more complex ones. ¢ This suggests it would not be wise to predict
either has a better chance at nomination at
¢ k is the lesser of the number of rows or columns this point.

¢ Numbers coming from


— Feb 1-3
— Gallup Poll daily tracking. Three-day
rolling average. N=approx. 1,200 Democrats
and Democratic-leaning voters nationwide.
— Gallup Poll daily tracking. Three-day
rolling average. N=approx. 1,000
Republican and Republican-leaning voters
nationwide.

KAPPA CASE-LEVEL EFFECT SIZES


Judgements by clinical psycholgists
¢ Measure of agreement (from Cohen) on the severity of suicide attempts by clients. ¢ Indexes such as Cohen’s d and eta2 estimate
¢ Though two folks (or groups of At first glance one might think (10+5+3)/24 = effect size at the group or variable level only
people) might agree, they might 75% agreement between the two.
also have a predisposition to However this does not take into account ¢ However, it is often of interest to estimate
respond in a certain way anyway chance agreement. differences at the case level
¢ Kappa takes this into consideration Judge 1 Totals ¢ Case-level indexes of group distinctiveness are
to determine how much agreement Judge 2 1 2 3 proportions of scores from one group versus
there would be after incorporating 1 10 (5.5) 2 0 12 another that fall above or below a reference point
what we would expect by chance 2 1 5 (3.67) 2 8
— O and E refer to the observed and 3 0 1 3 (.88) 4
¢ Reference points can be relative (e.g., a certain
expected frequencies on the diagonal 11 8 5 24 number of standard deviations above or below
of the table of Judge 1 vs Judge 2 the mean in the combined frequency distribution)
or more absolute (e.g., the cutting score on an
7.95 admissions test)
K= = 57%
K=
∑O − ∑ E D D 13.95
N −∑E D

CASE-LEVEL EFFECT SIZES


OTHER CASE-LEVEL EFFECT SIZES
¢ Cohen’s (1988)
measures of distribution ¢ Tail ratios (Feingold, 1995):
overlap: Relative proportion of scores
¢ U1 from two different groups that
— Proportion of nonoverlap fall in the upper extreme (i.e.,
either the left or right tail) of
— If no overlap then = 1, 0 if the combined frequency
all overlap
distribution
¢ U2 ¢ “Extreme” is usually defined
— Proportion of scores in relatively in terms of the
lower group exceeded by number of standard
the same proportion in
upper group deviations away from the
— If same means = .5, if all grand mean
group2 exceeds group 1 ¢ Tail ratio > 1.0 indicates one
then = 1.0 group has relatively more
¢ U3 extreme scores
— Proportion of scores in ¢ Here, tail ratio = p2/p1:
lower group exceeded by
typical score in upper
group
— Same range as U2

7
CONFIDENCE INTERVALS FOR
OTHER CASE-LEVEL EFFECT SIZES EFFECT SIZE
¢ Common language effect size (McGraw & ¢ Effectsize statistics such as Hedge’s
Wong, 1992) is the predicted probability
that a random score from the upper group g and η2 have complex distributions
exceeds a random score from the lower ¢ Traditional methods of interval
group
estimation rely on approximate
0 − ( X1 − X 2 ) standard errors assuming large
zCL =
s12 + s22 sample sizes
¢ General form for d

¢ Find area to the right of that value d ± tcv (sd )


— Range .5 – 1.0

CONFIDENCE INTERVALS FOR


EFFECT SIZE PROBLEM
¢ Standard errors ¢ However, CIs formulated in this manner are only
approximate, and are based on the central (t)
distribution centered on zero
d2 N
d/g = + ¢ The true (exact) CI depends on a noncentral
2(df w ) n1n2
distribution and additional parameter
— Noncentrality parameter
Δ2 N — What the alternative hype distribution is centered on
Δ= +
2(n2 ) n1n2 (further from zero, less belief in the null)
¢ d is a function of this parameter, such that if ncp
Dependent Samples = 0 (i.e. is centered on the null hype value), then
d = 0 (i.e. no effect)

d/g =
d2 2(1 − r ) n1 + n2
2(n − 1)
+
n d pop = ncp
n1n2

CONFIDENCE INTERVALS FOR LIMITATIONS OF EFFECT SIZE


EFFECT SIZE MEASURES
¢ Similar situation for r and effect eta2 ¢ Standardized mean differences:
size measures — Heterogeneity of within-conditions variances across
studies can limit their usefulness—the unstandardized
¢ Gist: we’ll need a computer program contrast may be better in this case
Measures of association:
to help us find the correct ¢
— Correlations can be affected by sample variances and
noncentrality parameters to use in whether the samples are independent or not, the design
calculating exact confidence is balanced or not, or the factors are fixed or not
— Also affected by artifacts such as missing observations,
intervals for effect sizes range restriction, categorization of continuous variables,
and measurement error (see Hunter & Schmidt, 1994,
¢ Statistica has such functionality for various corrections)
built into its menu system while — Variance-accounted-for indexes can make some effects
look smaller than they really are in terms of their
others allow for such intervals to be substantive significance
programmed (even SPSS scripts are
available (Smithson))

8
LIMITATIONS OF EFFECT SIZE LIMITATIONS OF EFFECT SIZE
MEASURES MEASURES
¢ How to fool yourself with effect size estimation: ¢ 6. Believe that finding large effects somehow lessens the need for
replication
¢ 1. Examine effect size only at the group level
¢ 7. Forget that effect sizes are subject to sampling error

¢ 2. Apply generic definitions of effect size magnitude ¢ 8. Forget that effect sizes for fixed factors are specific to the
without first looking to the literature in your area particular levels selected for study

¢ 3. Believe that an effect size judged as “large” according to ¢ 9. Forget that standardized effect sizes encapsulate other
generic definitions must be an important result and that a quantities such as the unstandardized effect size, error variance,
“small” effect is unimportant (see Prentice & Miller, 1992) and experimental design

¢ 4. Ignore the question of how theoretical or practical ¢ 10. As a journal editor or reviewer, substitute effect size
significance should be gauged in your research area magnitude for statistical significance as a criterion for whether a
work is published

¢ 5. Estimate effect size only for statistically significant ¢ 11. Think that effect size = cause size
results

RECOMMENDATIONS RECOMMENDATIONS
¢ First recall APA task force suggestions ¢ Report and interpret effect sizes in the context of
— Report effect sizes those seen in previous research rather than rules
of thumb
— Report confidence intervals
¢ Report and interpret confidence intervals (for
— Use graphics
effect sizes too) also within the context of prior
research
— In other words don’t be overly concerned with whether
a CI for a mean difference doesn’t contain zero but
where it matches up with previous CIs
¢ Summarize prior and current research with the
display of CIs in graphical form (e.g. w/ Tryon’s
reduction)
¢ Report effect sizes even for nonsig results

RESOURCES
¢ Kline, R. (2004) Beyond significance
testing.
— Much of the material for this lecture came
from this
¢ Rosnow, R & Rosenthal, R. (2003). Effect
Sizes for Experimenting Psychologists.
Canadian JEP 57(3).
¢ Thompson, B. (2002). What future
Quantitative Social Science Research
could look like: Confidence intervals for
effect sizes. Educational Researcher.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy