0% found this document useful (0 votes)
20 views14 pages

Understanding The Effect Size and Its Measures

Uploaded by

Vishal Biswas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views14 pages

Understanding The Effect Size and Its Measures

Uploaded by

Vishal Biswas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lessons in biostatistics

Understanding the effect size and its measures


Cristiano Ialongo*1,2
1Laboratory Medicine Department, “Tor Vergata” University Hospital, Rome, Italy
2Department of Human Physiology and Pharmacology, University of Rome Sapienza, Rome, Italy

*Corresponding author: cristiano.ialongo@gmail.com

Abstract
The evidence based medicine paradigm demands scientific reliability, but modern research seems to overlook it sometimes. The power analysis
represents a way to show the meaningfulness of findings, regardless to the emphasized aspect of statistical significance. Within this statistical fra-
mework, the estimation of the effect size represents a means to show the relevance of the evidences produced through research. In this regard, this
paper presents and discusses the main procedures to estimate the size of an effect with respect to the specific statistical test used for hypothesis
testing. Thus, this work can be seen as an introduction and a guide for the reader interested in the use of effect size estimation for its scientific en-
deavour.
Key words: biostatistics; statistical data analysis; statistical data interpretation

Received: February 05, 2016 Accepted: April 26, 2016

Introduction
In recent times there seems to be a tendency to an effect that is the magnitude of a hypothesis
report ever fewer negative findings in scientific re- which is observed through its experimental inves-
search (1). To see the glass “half full”, we might say tigation. Hereby we will provide means to under-
that our capability to make findings has increased stand how to use it properly, as well as the reason
over the years, with every researcher having a high why it helps in giving appropriate interpretation to
average probability of showing at least something the significance of a finding. Furthermore, through
through its own work. However, and unfortunate- a comprehensive set of examples with comments
ly, it is not so. As long as we are accustomed to it is possible to better understand the actual appli-
think in terms of “significance”, we tend to per- cation of what is explained in the text.
ceive the negative findings (i.e. absence of signifi-
cance) as something negligible, which is not worth Technical framework
reporting or mentioning at all. Indeed, as we often
feel insecure about our means, we tend to hide Stated simply, the “significance” is the magnitude
them fearing of putting at stake our scientific rep- of the evidence which the scientific observation
utation. produces regarding to a certain postulated hy-
Actually, such an extreme interpretation of signifi- pothesis. Such a framework basically relies on two
cance does not correspond to what formerly assumptions: 1) the observation is intimately af-
meant by those who devised the hypothesis test- fected by some degree of randomness (a heritage
ing framework as a tool for supporting the re- of theory of error from which statistics derives),
searcher (2). In this paper, we aim to introduce the and 2) it is always possible to figure out the way
reader to the concept of estimation of the size of the observation would look like when the phe-

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


150 ©Copyright by Croatian Society of Medical Biochemistry and Laboratory Medicine. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Ialongo C. Guide to effect size calculations

nomenon is completely absent (a derivation of the actual measure would be shown not significant as
“goodness of fit” approach of Karl Pearson, the 1 + 0.1% = 1.001 < 1.1. Contrarily, increasing preci-
“common ancestor” of modern statisticians). Prac- sion up to 0.01% would give a range of 0.9999–
tically, the evidence can be quantified through the 1.0001, thus showing a 0.1% difference as signifi-
hypothesis testing procedure, which we owe to cant since 1.001 > 1.0001. With respect to experi-
Ronald Fisher on one hand, and Jerzy Neyman and mental designs, we can assume that each observa-
Egon Pearson (son of Karl) on the other hand (2). tion taken on a case of the study population cor-
The result of hypothesis testing is the probability responds to a single trial. Therefore enlarging the
(or P-value) for which it is likely to consider the ob- sample would increase the probability of getting
servation shaped by chance (the so-called “null- small P-value even with a very faint effect. As a
hypothesis”) rather than by the phenomenon (the drawback, especially with biological data, we
so-called “alternative hypothesis”). The size at would risk to misrecognize the natural variability
which the P-value is considered small enough for or even to measure error as a significant effect.
excluding the effect of chance corresponds to the
statistical significance. Thus, what is the sense of a
Development of ES measures
non-significant result? There are two possibilities:
• there is actually no phenomenon and we ob- The issue of achieving meaningful results is meas-
serve just the effect of chance, and uring, or rather estimating, the size of the effect. A
• a phenomenon does exist but its small effect is concept which could seem puzzling is that the ef-
overwhelmed by the effect of chance. fect size needs to be dimensionless, as it should
deliver the same information regardless of the sys-
The second possibility poses the question of
tem used to take the observations. Indeed, chang-
whether the experimental setting actually makes
ing the system should not influence the size of ef-
possible to show a phenomenon when there is re-
fect and in turn its measure, as this would disagree
ally one. In order to achieve this, we need to quan-
with the objectiveness of scientific research.
tify how large (or small) is the expected effect pro-
duced by the phenomenon with respect to the Said so, it is noteworthy that much of the work re-
observation through which we aim to detect it. garding ES measuring was pioneered by statisti-
This is the so-called effect size (ES). cian and psychologist Jacob Cohen, as a part of
the paradigm of meta-analysis he developed (4,5).
However, Cohen did not create anything which
P-value limitations was not already in statistics, but rather gave a
A pitfall in hypothesis testing framework is that it means to spread the concept of statistical power
assumes the null hypothesis is always determina- and size of an effect among non-statisticians. It
ble, which means it is exactly equal to a certain should be noticed that some of the ES measures
quantity (usually zero). Under a practical stand- he described were already known to statisticians,
point, to achieve such a precision with observation as it was regarding to Pearson’s product-moment
would mean to get results which are virtually iden- correlation coefficient (formally known as r, eq. 2.1
tical to each other, since any minimal variability in Table 1) or Fisher’s variance ratio (known as eta-
would produce a deviation from the null hypothe- squared, eq. 3.4 in Table 1). Conversely, he derived
sis prediction. Therefore, with a large number of some other measures directly from certain already
trials, such a dramatic precision would cause the known test statistic, as it was with his “d” measure
testing procedure of getting too sensible to trivial (eq. 1.1 in Table 1) which can be considered stem-
differences, making them looking like significant, ming strictly from the z-statistic and the Student’s
even when they are not (3). To an intuitive level, t-statistic (6).
let’s imagine that our reference value is 1 and we A relevant aspect of ES measures is that they can
set precision level at 10%. By the precision range be recognized according to the way they capture
of 0.9–1.1 it would result, a 0.1% difference in any the nature of the effect they measure (5):
http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63
151
Ialongo C. Guide to effect size calculations

Table 1. Effect size measures

Equation
Measure Test
Formula Number

t-test with equal samples size


Cohen’s d 1.1
and variance

t-test on small samples /


Hedge’s g 1.2
unequal size

t-test with unequal variances /


Glass’s Δ 1.3
control group

Glass’s Δ* t-test with small control group 1.4

Steiger’s ψ (psi) omnibus effect (ANOVA) 1.5

Pearson’s r linear correlation 2.1

Spearman’s ρ (rho) rank correlation 2.2

nominal association (2 x 2
Cramer’s V 2.3
table)

(phi) Chi-square (2 x 2 table) 2.4

r2 simple linear regression 3.1

adjusted r2 multiple linear regression 3.2

multiple linear regression 3.3a


Cohen’s f2
n-way ANOVA 3.3b

η2 (eta – squared) 1-way ANOVA 3.4

partial η2 n-way ANOVA 3.5

ω2 (omega – squared) 1-way / n-way ANOVA 3.6

2 x 2 table 4.1a
Odds ratio (OR)
logistic regression 4.1b

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


152
Ialongo C. Guide to effect size calculations

Effect size (ES) measures and their equations are represented with the corresponding statistical test and appropriate condition
of application to the sample; the size of the effect (small, medium, large) is reported as a guidance for their appropriate
interpretation, while the enumeration (Number) addresses to their discussion within the text.

MSE – mean squared error = SSerror / (N – k). Bessel’s correction – n / (n-1)[ ].

; – average of group / sample. x, y – variable (value). GM – grand mean (ANOVA). s2 – sample variance. n – sample cases. N – total

cases. – summation. – chi-square (statistic). u, v – ranks. m – minimum number of rows / columns. p – number of predictors
(regression). k – number of groups (ANOVA). SSfactor – factor sum of squares (variance between groups). SSerror – error sum of square
(variance within groups). SStotal – total sum of squares (total variance). xmyn – cell count (2 x 2 table odds ratio). e – constant (Euler’s
number). β – exponent term (logistic function).

• through a difference, change or offset between from this assumption is not negligible (e.g. one
two quantities, similarly to what assessed by group doubles the other) it is possible to account
the t-statistic for it using the Bessel’s correction (Table 1) for the
• through an association or variation between biased estimation of sample standard deviation.
two (or more) variates, as is in the correlation This gives rise to the Hedge’s g (eq. 1.2 in Table 1
coefficient r. and Example 1), which is a standardized mean dif-
The choice of the appropriate kind of ES measure ference corrected by the pooled weighted stand-
to use is dictated by the test statistic the hypothe- ard deviation (8).
sis testing procedure relies on. Indeed, it deter- A particular case of ES estimation involves experi-
mines the experimental design adopted and in ments in which one of the two groups acts as a
turn the way the effect of the phenomenon is ob- control. In that we presume that any measure on
served (7). For instance in Table 1, which provides control is untainted by the effect, we can use its
the most relevant ES measures, each of them is standard deviation to standardize the difference
given alongside the test statistic framework it re- between averages in order to minimize the bias, as
lates to. In some situations it is possible to choose it is done in the Glass’s delta (Δ) (eq. 1.3 in Table 1
between several alternatives, in that almost all ES and Example 1) (9). A slight modification of Glass’s
measures are related each other. Δ (termed Glass’s Δ*) (eq. 1.4 in Table 1), which em-
bodies Bessel’s correction, is useful when the con-
trol sample size is small (e.g. less than 20 cases)
Difference-based family and this sensibly affects the estimate of control’s
In the difference-based family the effect is meas- standard deviation.
ured as the size of difference between two series It is possible to extend the framework of differ-
of values of the same variable, taken with respect ence family also to more than two groups, correct-
to the same or different samples. As we saw in the ing the overall difference (difference of each ob-
previous section, this family relies on the concept servation from the average of all observations) by
formerly expressed by the t-statistic of standard- the number of groups considered. Under a formal
ized difference. The prototype of this family was point of view this corresponds to the omnibus ef-
provided by Cohen through the uncorrected stand- fect of a 1 factor analysis of variance design with
ardized mean difference or Cohen’s d, whose equa- fixed effect (1-way ANOVA). Such an ES measure is
tion is reported in Table 1 (eq. 1.1; and Example 1). known as Steiger’s psi (ψ) (eq. 1.5 in Table 1 and Ex-
Cohen’s d relies on the pooled standard deviation ample 2) or root mean square standardized effect
(the denominator of equation) to standardize the (RMSSE) (10,11).
measure of the ES; it assumes the groups having As a concluding remark of this section we would
(roughly) equal size and variance. When deviation mention that it is possible to compute Cohen’s d

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


153
Ialongo C. Guide to effect size calculations

Example 1
Two groups of subjects, 30 people each, is enrolled to test the serum blood glucose after the adminis-
tration of an oral hypoglycemic drug. The study aims to assess whether a race-factor might have an ef-
fect over the drug. Laboratory analyses show a blood glucose concentration of 7.8 ± 1.3 mmol/L and 7.1
± 1.1 mmol/L, respectively. According to eq. 1.1 in Table 1, the ES measure is:

For instance, the power analysis shows that such a cohort (n1 + n2 = 60) would give 60% of probability
to detect an effect of a size as large as 0.581 (that is the statistical power). Therefore we shall question
whether the study was potentially inconclusive with respect to its objective.
In another experimental design on the same study groups, the first one is treated with a placebo in-
stead of the hypoglycemic drug. Moreover this group’s size is doubled (n = 60) in order to increase the
statistical power of the study.
For recalculating the effect size, the Glass’s Δ is used instead, as the first group here clearly acts as con-
trol. Knowing that its average glucose concentration is 7.9 ± 1.2 mmol/L, according to eq. 1.3 it is:

The ES calculated falls close to the Cohen’s d. However when the statistical power is computed based
on new sample size (N = 90) and ES estimate, the experimental design shows a power of 83.9% which is
fairly adequate. It is noteworthy that the ES calculated through eq. 1.2 gave the following estimate:

x x

Example 2
A cohort of 45 subjects is randomized into three groups (k = 3) of 15 subjects each in order to investi-
gate the effect of different hypoglycemic drugs. Particularly, the blood glucose concentration is 8.6 ±
0.2 mmol/L for placebo group, 7.8 ± 0.2 mmol/L for drug 1 group and 6.8 ± 0.2 mmol/L for drug 2
group. In order to calculate the Steiger’s ψ, data available through the ANOVA summary and table were
obtained using MS Excel’s add-in ToolPak (it can be found under Data→Data Anaysis→ANOVA: single
factor):

ANOVA SUMMARY
Groups Count Sum Average Variance
Drug 1 15 116.3 7.8 0.06
Drug 2 15 102.3 6.8 0.03
Placebo 15 128.3 8.6 0.02

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


154
Ialongo C. Guide to effect size calculations

ANOVA TABLE
Variance component DF MS F P F crit
Between SSfactor 22.5 2 11.24 288 < 0.01 3.2
Groups
Within SSerror 1.6 42 0.04
Group
Total SStotal 24.1 44
ss – sum of squares, DF – degrees of freedom, MS – mean
squares.

Notice that the ANOVA summary displays descriptive statistics for the groups in the design, while the
ANOVA table gives information regarding the results of ANOVA calculations and statistical analysis. Par-
ticularly with respect to power analysis calculations (see later on in Example 4), it shows the value of the
components which are the between groups (corresponding to the factor’s sum of squares, SSfactor), the
within groups (corresponding to the error’s sum of squares, SSerror) and the total variance (that is given
by the summation of factor’s and error’s sum of squares).
Considering that the grand mean (average of the all the data taken as a single group) is 7.7 mmol/L, the
formula becomes:

From the ANOVA table we notice that this design had a very large F-statistic (F = 288) which resulted in
a P-value far below 0.01, which agrees with an effect size as large as 4.51.

also for non-Student’s family test as the F-test, as which resides in the Pearson’s product moment
well as for non-parametric tests like Chi-square or correlation coefficient, which is indeed the pro-
the Mann-Whitney U-test (12-14). genitor of this group (eq. 2.1 in Table 1 and Exam-
ple 3). In this regard it should be reminded that by
definition the correlation coefficient is nothing but
Association – based family
the joint variability of two quantities around a
In the association-based family the effect is meas- common focal point, divided by the product of
ured as the size of variation between two (or more) the variability of each quantity around its own bar-
variables observed in the same or in several differ- ycentre or average value (15). Therefore, if the two
ent samples. Within this family it is possible to do a variables are tightly associated to each other, their
further distinction, based on the way the variabili- joint variability equals the product of their individ-
ty is described. ual variabilities (which is the reason why r can range
only between 1 and -1), and the effect can be seen
Associated variability: correlation as what forces the two variables to behave so.
In the first sub-family, variability is shown as a joint When a non-linear association is thought to be
variation of the variables considered. Under a for- present, or the continuous variable were discre-
mal point of view it is nothing but the concept tized into ranks, it is possible to use the Spear-

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


155
Ialongo C. Guide to effect size calculations

Example 3
The easiest way to understand how the ES measured through r works is to look at scattered data:

30 30
A B
25 r = 0.82 25 r = 0.006
Variable 2 (Y)

Variable 2 (Y)
20 20

15 15
10 10

5 5

0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Variable 1 (X) Variable 1 (X)

In both panels the dashed lines represent the average value of X (vertical) and of Y (horizontal). In panel
A the correlation coefficient was close to 1 and the data gave the visual impression of lying on a straight
line. In panel B, the data of Y were just randomly reordered with respect to X, resulting in a coefficient r
very close to zero although the average value of Y was unchanged. Indeed the data appeared to be
randomly scattered with no pattern. Therefore the effect which made X and Y to behave similarly in A
was vanished by the random sorting of Y, as randomness is by definition the absence of any effect.

man’s rho (ρ) instead (eq. 2.2 in Table 1) (6). Alter- e, which suits the so-called general linear models
natively, for those variable naturally nominal, if a (GLM), to which ANOVA, linear regression, and any
two-by-two (2 x 2) table is used, it is possible to kind of statistical model which can be considered
calculate the ES through the coefficient phi ( ) (eq. stemming from that linear function belong. Partic-
2.4 in Table 1). In case of unequal number of rows ularly, in GLM the X is termed the design (one or a
and columns, instead of eq. 2.4, the Cramer’s V can set of independent variables), b weight and e the
be used (eq. 2.3 in Table 1), in which a correction random normal error. In general, such models aim
factor for the unequal ranks is used, similarly to to describe the way Y varies according to the way
what is done with the difference family. X changes, using the association between varia-
bles to predict how this happens with respect to
Explained variability: general linear models their own average value (15). In linear regression,
the variables of the design are all continuous, so
In the second sub-family the variability is shown that estimation is made point-to-point between X
through a relationship between two or more vari- and Y. Conversely, in ANOVA, the independent var-
ables. Particularly, it is achieved considering a de- iables are discrete/nominal, and thus estimation is
pendence of one on another, assuming that the rather made level-to-point. Therefore, the ways
change in the first is dictated by the other. Under a we assess the effect for these two models slighlty
formal standpoint, the relationship is a function differ, although the conceptual frame is similar.
between the two (in simplest case) variables, of
which one is dependent (Y) and the other is inde- With respect to linear regression with one inde-
pendent (X). The easiest way to give so is through pendent variable (predictor) and the intercept
a linear function of the well-known form Y = bX + term (which corresponds to the average value of

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


156
Ialongo C. Guide to effect size calculations

Y), the ES measure is given through the coefficient change of X) with respect to the overall variability
of determination or r2 (eq. 3.1 in Table 1). Notewor- observed (the scatter of all the observations of Y).
thy, in this simplest form of the model, r2 is noth- Therefore, we can write the general form of any ES
ing but the squared value of r (6). This should be measure of this kind:
not surprising because if a relationship is present
between the variables, then it can be used to
achieve prediction, so that the stronger the rela-
tionship the better is the prediction. For mutiple Recalling the law of variance decomposition, for a
linear regression, where we have more than one 1-way ANOVA the quantity above can be achieved
predictor, we can use the Cohen’s f2 instead (eq. through the eta-squared (η2), in which the varia-
3.3a in Table 1) in which the r2 is corrected by the tion between clusters or groups accounts for the
amount of variation that predictors leave unex- variability explained by the factor within the de-
plained (4). Sometimes the adjusted r2 (eq. 3.2 in sign (eq. 3.4 in Table 1 and Example 4) (4,6). The
Table 1) is usually presented alongside to r2 in mul- careful reader will recognize at this point the anal-
tiple regression, in which the correction is made ogies between r2 and η2 with no need for any fur-
for the number of predictors and the cases. It ther explanation.
should be noticed that such a quantity is not a
It must be emphasized that η2 tends to inflate the
measure of effect, but rather it shows how suitable
explained variability giving quite larger ES esti-
the actual set of predictors is with respect to the
mates than it should be (16). Moreover, in models
model’s predictivity.
with more than one factor it tends to underesti-
With respect to ANOVA, the linear model is rather mate ES as the number of factors increases (17).
used in order to describe how Y varies when the Thus, for designs with more than one factor it is
changes in X are discrete. Thus, the effect can be advisable to use the partial-η2 instead (eq. 3.5), re-
thought as a change in clustering of Y with respect marking that the equation given herein is just a
to the value of X, termed the factor. In order to as- general form and the precise form of its terms de-
sess the magnitude of the effect, it is necessary to pends on the design (18). Noteworthy, η2 and
show how much the clustering explains the varia- partial-η2 coincide in case of 1-way ANOVA (19,20).
bility (where the observations of Y locate at the A most regarded ES for ANOVA, which is advisable

Example 4
Recalling the ANOVA table seen in Example 2, we can compute η2 accordingly:

Thereafter for ω2 we got instead:


x

If we recall the value we got previously for ψ (4.51) we notice a considerable difference between these
two. Actually, ψ can be influenced by a single large deviating average within the groups, therefore om-
nibus effect should be regarded as merely indicative of the phenomenon under investigation. Note-
whorthy, it should be possible to assess the contrast ES (e.g. largest average vs others) properly rear-
ranging the Hedge’s g.

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


157
Ialongo C. Guide to effect size calculations

to use in place of any other ES measure in that it is Table 2. 2 x 2 nominal table for odds ratio calculation
virtually unbiased, is the omega-squared (ω2) (eq.
3.6 in Table 1 and Example 4) (16,18,21). Lastly, it Outcome (Y)
Factor (X)
should be noticed that Cohen’s f2 can also suit n- 1 0
way ANOVA (eq. 3.3b) (4). It should be emphasized 1 x1y1 (Ppresent) or a x1y0 (1 – Ppresent) or b
that in general it holds η2 > partial-η2 > ω2. 0 x0y1 (Pabsent) or c x0y0 (1 – Pabsent) or d
1 – presence; 0 – absence. The terms presence and absence
refer to the factor as well as to the outcome.
Odds ratio a,b,c,d – common coding of cell frequencies used for the
cross product calculation.
The odds ratio (OR) can be regarded as a peculiar
kind of ES measure because is suits both 2 x 2 con-
tingency tables as well as non-linear regression
models like logistic regression. In general, OR can
be tought as a special kind of association family However, OR can be also estimated by means of
ES for dicothomous (binary) variables. In plain logistic regression, which can be considered simi-
words, the OR represents the likelihood that an lar to a linear model in which the dependent vari-
event occurs due to a certain factor against the able (termed the outcome in this model) is binary.
probability that it arises just by chance (that is Indeed, a logistic function is used instead of a line-
when the factor is absent). If there is an associa- ar model in that outcome abruptly changes be-
tion then the effect changes the rate of outcomes tween two separate statuses (present/absent), so
between groups. For 2 x 2 tables (like Table 2) the that prediction has to be modelled level-to-level
OR can be easily calculated using the cross prod- (23). In such a model, finding the weight of the de-
uct of cells frequency (eq. 4.1a in Table 1 and Ex- sign (that is b in the GLM) is tricky, but using a log-
ample 5A) (22). arithmic transformation, it is still possible to esti-

Example 5A
Getting OR from 2 x 2 tables is trivial and can be easily achieved by hand calculation as it is possible by
the table below:

Outcome
Factor
present absent
present 44 23
absent 19 31

Therefore using eq. 4.1a in Table 1 it can be calculated:


x
x
It is noteworthy that in this case the Cramer’s V gave also an intermediate ES (0.275). Nonetheless they
represent quite distant concepts in that Cramer’s V is aimed to show wheter variability within the cross-
tab frame is due to the factor, while OR shows how factor changes the rate of outcomes in a non-addi-
tive way.

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


158
Ialongo C. Guide to effect size calculations

mate it through a linear function. It is possible to such a pitfall should deserve a broader discussion
show that b (usually regarded as beta in this which would take us far beyond the scope of this
framework) is the exponent of a base (the Euler’s paper (10,11,26).
number or e) which gives the OR (23). Noteworthy, Nonetheless quite easy methods based on estima-
each time there is a unit increase in the predictor, tion of ES variance can be found and have been
the effect changes according to a multiplicative shown to work properly up to mild sized effects as
rather than additive effect, differently than what is for Cohen’s d (Example 6) (25). For instance, CI
seen in GLM. A major advantage of logistic regres- estimation method regarding OR and can be easi-
sion relies in its flexibility with respect to cross ta- ly achieved by the cells frequency of the 2 x 2 table
bles, in that it is possible to estimate ES accounting (Example 5B) (6).
for covariates and factors more than binary (multi-
nomial logistic regression). Moreover, through lo- We would remark that although CI of ES might ex-
gistic regression it is also possible to achieve OR quisitely concern meta-analysis, actually they rep-
for each factor in a multifactor analysis similarly to resent the most reliable proof of the ES reliability.
what is done through GLM. An aspect which deserves attention in this regard
is that CI of ES reminds us that any ES actually
measured is just an estimate taken on a sample,
Confidence interval and as such it depends on the sample size and var-
Considering that they are estimates, it is possible iability. It is sometimes easy to misunderstand or
to give confidence interval (CI) for ES measures as forget this, and often the ES obtained through an
well, with their general rules holding also in this experiment is erroneously confused with the one
case, so that the narrower the interval the more hypothesized for the population (27). In this re-
precise the estimate is (24). However, this one is gard, running power analysis after the fact would
not a simple task to achieve because ES has non- be helpful. Indeed, supposing the population ES
central distribution as it represents a non-null hy- being greater or at least equal with the one actu-
pothesis (25). The methods devised to overcome ally measured, it would show the adequacy of our

Example 5B
In order to calculate the CI of OR from Example 5A it is necessary to compute the standard error (SE) as
follows:

First, it is necessary to transform the OR taking its natural logarithm (ln) for using the normal distribu-
tion to get the confidence coefficient (that one which corresponds to the α level). Therefore we got ln
(3.12) = 1.14, so that:
x
A back transformation through the exponential function makes possible to get this result in its original
scale. Hence, if e0.38 = 1.46 and e1.90 = 6.72, the 95% CI is 1.46 to 6.72. Noteworthy, if the interval doesn’t
comprise the value 1 (recalling that ln (1) = 0), the OR and in turn the ES estimate can be considered sig-
nificant. However, we shall object that the range of CI is quite wide, so that the researcher should pay
attention when commenting the point estimation of 3.12.

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


159
Ialongo C. Guide to effect size calculations

Example 6
Using the data from Example 1, we can calculate the Cohen’s d variance estimate with the following
equation:

x x x x

Then, we can use this value to compute the 95% CI accordingly:


x x
Therefore the estimate falls within the interval ranging -0.150 and 1.312. Interestingly, this shows that
the value of the ES estimated through that design was unreliable, because the confidence interval com-
prises the zero value. Indeed the experimental design aforementioned gave a non-statistically signifi-
cant result when testing the average difference between the two groups by means of unpaired t-test.
This is in accordance with the finding of an underpowered design, which is unable to show a difference
if there is one, as well as to give for it any valid measure.

experimental setting with respect to a hypothesis from t-statistic, it is necessary to know the t score
as large as the actual ES (28). Such a proof will as well as the size of each sample (Example 7).
surely guide our judgment regarding the proper
interpretation of the P-value obtained whereby
Interpreting the magnitude of ES
the same experiment.
Cohen gave some rules of thumb to qualify the
Conversion of ES measures magnitude of an effect, giving also thresholds for
categorization into small, medium and large size
Maybe the most intriguing aspect of ES measures (4). Unfortunately, they were set based on the kind
is that it is possible to convert one kind of measure of phenomena which Cohen observed in his field,
into another (4,25). Indeed, it is obvious that an ef- so that they can be hardly translatable into other
fect is as such regardless to the way it is assessed, domains outside behavioural sciences. Indeed
so that changing the shape of the measure is noth- there is no means to give any universal scale, and
ing but changing the gear we use for measuring. the values which we take as reference nowadays
Although it might look like appealing, this is some- are just a heritage we owe to the way the study of
how a useless trick except for meta-analysis. More- ES was commenced. Interestingly, Cohen as well
over, it might be even misleading if one forgets as other researchers have tried to interpret the dif-
what each kind of ES measure represents and is ferent size ranges using an analogy between ES
meant for. This kind of “lost-in-translation” is quite and Z-score, whereby there was a direct corre-
common when the conversion is made between spondence between the value and the probability
ES measures belonging to different families (Ex- to correctly recognize the presence of the investi-
ample 7). gated phenomenon by its single observation (29).
Contrarily, it seems to be more useful to obtain ES Unfortunately, although alluring, this “percentile-
measure from the test statistic whenever the re- like” interpretation is insidious in that it relies on
ported results lack of any other means to get ES the assumption that the underlying distribution is
(4,13,21). However, as in the case of Cohen’s d normal.

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


160
Ialongo C. Guide to effect size calculations

Example 7
The data which were used to generate scatterplot B of Example 3 are compared herein by means of un-
paired t-test. Therefore, considering the average values of 16 ± 6 and 15 ± 6, we obtained a t-statistic of
0.453. Hence, the corresponding Cohen’s d ES was:
x x
x x x x
It should be noticed that panel B of Example 3 reported a correlation close to 0, that is no effect as we
stated previously. By the same groups let’s calculate now the Cohen’s d from r:
x x

Not surprisingly we obtain a negligible effect. Let’s now try again with the data which produced the
scatterplot of panel A. While the statistical test gives back the same result, this time the value of d ob-
tained through r changes dramatically:
x x

The explanation is utterly simple. The unpaired t-test is not affected by the order of observations within
each group, so that shuffling the data makes no difference. Conversely, the correlation coefficient relies
on data ordering, in that it gives a sense to each pair of observations it is computed with. Thus, comput-
ing d through r gives an ES estimate which is nothing but the difference or offset between observa-
tions that would have been produced by an effect as large as the one which produced an association as
much strong.

An alternative way of figuring out ES magnitude alone applications (30,32). For instance, the soft-
relies on its “contextualization”, that is taking its ware package Statistica (StatSoft Inc., Tulsa, USA)
value with respect to any other known available provides a comprehensive set of functions for
estimation, as well as to the biological or medical power analysis, which allows computing ES as well
context it refers to (30). For instance, in complex as CI for many statistical ES measures (33). Alterna-
disease association studies, where single nucleo- tively, the freely available application G*Power
tide polymorphisms usually have an OR ranging (Heinrich Heine Universitat, Dusseldorf, Germany)
around 1.3, evidence of an OR of 2.5 should not be makes possible to run in stand-alone numerous ES
regarded as moderate (31). calculations with respect to the different statistical
test families (34,35). Finally, it is possible to find on-
line many comprehensive suites of calculators for
Computing ES
different ES measures (36-38).
The calculation of ES is part of the power analysis Notwithstanding, it should be noted that any ES
framework, thus the computation of its measures measure showed in tables within this paper can be
is usually provided embedded within statistical used for calculation with basic (not statistical)
software packages or achieved through stand- functions available through a spreadsheet like MS

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


161
Ialongo C. Guide to effect size calculations

Excel (Microsoft Corp., Redmond, USA). In this re- In the introduction of this paper, we were wonder-
gard, the Analysis ToolPak embedded in MS Excel ing whether negative findings were actually de-
allows to get information for both ANOVA and lin- creasing in scientific research, or rather we were
ear regression (39). observing a kind of yet unexplained bias. Of
course, the dictating paradigm of P-value is lead-
ing to forgetting what is scientific evidence and
Conclusions (Are we ready for the efect
what is the meaning in its statistical assessment.
size?) Nonetheless, through the ES we could start teach-
In conclusion the importance of providing an esti- ing ourselves of weighting findings against both
mate of the effect alongside the P-value should be chance and magnitude, and that would be a huge
emhasized, as it is the added value to any research help in our appreciation of any scientific achieve-
representing a step toward the scientific trueness. ment. By the way, we might also realize that the
For this reason, researchers should be encouraged bias probably lays in the way we conceive nega-
to show the ES in their work, particularly reporting tive and positive things, the reason why we tend
it any time the P-value is mentioned. It should be to mean the scientific research as nothing but a
also advisable to provide CI along with ES, but we “positive” endeavour regardless to the size of what
are aware that in many situations it could be rather it comes across.
discouraging as there is still no accessible means
for its computation as it is with ES. In this regard, Potential conflict of interest
calculators might be of great help, although the None declared.
researchers should always bear in mind formulae
to recall what each ES is suited for and what infor-
mation it actually provides.

References
1. Fanelli D. Negative results are disappearing from most dis- 9. Zakzanis KK. Statistics to tell the truth, the whole truth,
ciplines and countries. Scientometrics 2011;90:891-904. and nothing but the truth: formulae, illustrative numerical
http://dx.doi.org/10.1007/s11192-011-0494-7. examples, and heuristic interpretation of effect size analyses
2. Lehmann EL, editor. Fisher, Neyman, and the creation of cla- for neuropsychological researchers. Arch Clin Neuropsychol
ssical statistics. New York, NY: Springer, 2011. 2001;16:653-67. http://dx.doi.org/10.1093/arclin/16.7.653.
3. Lin M, Lucas HC, Shmueli G. Too big to fail: large samples 10. Steiger JH, Fouladi RT. Noncentrality interval estimation and
and the p-value problem. Inform Syst Res 2013;24:906-17. the evaluation of statistical models. In: Harlow LL, Mulaik
http://dx.doi.org/10.1287/isre.2013.0480. SA, Steiger JH, eds. What if there were no significance tests?
4. Cohen J, editor. Statistical power analysis for the behavioral Mahwah, NJ: Lawrence Erlbaum Associates, 1997. p. 221-258.
sciences. 2nd ed. Mahwah, NJ: Lawrence Erlbaum Associa- 11. Steiger JH. Beyond the F test: Effect size confidence inter-
tes, 1988. vals and tests of close fit in the analysis of variance and con-
5. Cohen J. A power primer. Psychological bulle- trast analysis. Psychological methods 2004;9:164-82. http://
tin 1992;112:155-9. http://dx.doi.org/10.1037/0033- dx.doi.org/10.1037/1082-989X.9.2.164.
2909.112.1.155. 12. Thalheimer W, Cook S. How to calculate effect sizes from pu-
6. Armitage P, Berry G, Matthews JNS, eds. Statistical methods blished research articles: A simplified methodology. Availa-
in medical research. 4th ed. Osney Mead, Oxford: Blackwell ble at: http://www.bwgriffin.com/gsu/courses/edur9131/
Publishing, 2007. content/Effect_Sizes_pdf5.pdf. Accessed February 1st 2016.
7. Lieber RL. Statistical significance and statistical power in 13. Dunst CJ, Hamby DW, Trivette CM. Guidelines for calcula-
hypothesis testing. J Orthop Res 1990;8:304-9. http://dx.doi. ting effect sizes for practice-based research syntheses. Cen-
org/10.1002/jor.1100080221. terscope 2004;2:1-10.
8. Hedges LV. Distribution theory for Glass’s estimator of effect 14. Tomczak A, Tomczak E. The need to report effect size esti-
size and related estimators. J Educ Stat 1981;6:106-28. mates revisited. An overview of some recommended mea-
http://dx.doi.org/10.2307/1164588. sures of effect size. Trends Sport Sci 2014;1:19-25.

Biochemia Medica 2016;26(2):150–63 http://dx.doi.org/10.11613/BM.2016.015


162
Ialongo C. Guide to effect size calculations

15. Bewick V, Cheek L, Ball J. Statistics review 7: Correlati- 28. Levine M, Ensom MH. Post hoc power analysis: an idea
on and regression. Crit Care 2003;7:451-9. http://dx.doi. whose time has passed? Pharmacotherapy 2001;21:405-9.
org/10.1186/cc2401. http://dx.doi.org/10.1592/phco.21.5.405.34503.
16. Olejnik S, Algina J. Measures of effect size for compara- 29. Coe R. It’s the effect size, stupid: what effect size is and
tive studies: Applications, interpretations, and limitati- why it is important. Available at: http://www.cem.org/
ons. Contemp Educ Psychol 2000;25:241-86. http://dx.doi. attachments/ebe/ESguide.pdf. Accessed February 1st 2016.
org/10.1006/ceps.2000.1040. 30. McHugh ML. Power Analysis in Research. Biochem Med
17. Ferguson CJ. An effect size primer: a guide for clinicians and (Zagreb) 2008;18:263-74. http://dx.doi.org/10.11613/
researchers. Prof Psychol Res Pract 2009;40:532-8. http:// BM.2008.024.
dx.doi.org/10.1037/a0015808. 31. Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small
18. Olejnik S, Algina J. Generalized eta and omega squared sta- effect sizes of individual genetic variants on the design and
tistics: Measures of effect size for some common resear- interpretation of genetic association studies of complex
ch designs. Psychological methods 2003;8:434-47. http:// diseases. Am J Epidemiol 2006;164:609-14. http://dx.doi.
dx.doi.org/10.1037/1082-989X.8.4.434. org/10.1093/aje/kwj259.
19. Pierce CA, Blochk RA, Aguinis H. Cautionary note on re- 32. McCrum-Gardner E. Sample size and power calculations
porting eta-squared values from multi factor anova de- made simple. Int J Ther Rehabil 2009;17:10-4. http://dx.doi.
signs. Educ Psychol Meas 2004;64:916-24. http://dx.doi. org/10.12968/ijtr.2010.17.1.45988.
org/10.1177/0013164404264848. 33. Statsoft STATISTICA Help. Available at: http://documenta-
20. Levine TR, Hullett CR. Eta squared, partial eta squared, tion.statsoft.com/STATISTICAHelp.aspx?path=Power/Indi-
and misreporting of effect size in communication rese- ces/PowerAnalysis_HIndex. Accessed February 1st 2016.
arch. Hum Commun Res 2002;28:612-25. http://dx.doi. 34. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexi-
org/10.1111/j.1468-2958.2002.tb00828.x. ble statistical power analysis program for the social, be-
21. Keppel G, Wickens TD, eds. Design and analysis: A havioral, and biomedical sciences. Behav Res Methods
Researcher’s Handbook. 4th ed. Englewood Cliffs, NJ: Pren- 2007;39:175-91. http://dx.doi.org/10.3758/BF03193146.
tice Hall, 2004. 35. Faul F, Erdfelder E, Buchner A, Lang AG. Statistical power
22. McHugh ML. The odds ratio: calculation, usage, and inter- analyses using G*Power 3.1: tests for correlation and regre-
pretation. Biochem Med (Zagreb) 2009;19:120-6. http:// ssion analyses. Behav Res Methods 2009;41:1149-60. http://
dx.doi.org/10.11613/BM.2009.011. dx.doi.org/10.3758/BRM.41.4.1149.
23. Kleinbaum DG, Klein M, eds. Logistic regression: a self-lear- 36. Lenhard W, Lenhard A. Calculation of effect size 2015. Ava-
ning text. 2nd ed. New York, NY: Springer-Verlag, 2002. ilable at: http://www.psychometrica.de/effect_size.html.
24. Simundic AM. Confidence Interval. Biochem Med (Zagreb) Accessed February 1st 2016.
2008;18:154-61. http://dx.doi.org/10.11613/BM.2008.015 37. Wilson DB. Practical meta-analysis effect size calculator.
25. Fritz CO, Morris PE, Richler JJ. Effect size estimates: Current Available at: http://www.campbellcollaboration.org/re-
use, calculations, and interpretation. J Exp Psychol Gen sources/effect_size_input.php. Accessed February 1st 2016.
2012;141:2-18. http://dx.doi.org/10.1037/a0024338. 38. Lyons LC, Morris WA. The Meta Analysis Calculator 2016.
26. Nakagawa S, Cuthill IC. Effect size, confidence interval and Available at: http://www.lyonsmorris.com/ma1/. Accessed
statistical significance: a practical guide for biologists. February 1st 2016.
Biol Rev Camb Philos Soc 2007;82:591-605. http://dx.doi. 39. Harmon M. Effect size for single-factor ANOVA 2014. Avai-
org/10.1111/j.1469-185X.2007.00027.x. lable at: http://blog.excelmasterseries.com/2014/05/effect-
27. O’Keefe DJ. Post hoc power, observed power, a priori power, size-for-single-factor-anova.html. Accessed February 1st
retrospective power, prospective power, achieved power: 2016.
Sorting out appropriate uses of statistical power analyses.
Commun Methods Meas 2007;1:291-9. http://dx.doi.
org/10.1080/19312450701641375.

http://dx.doi.org/10.11613/BM.2016.015 Biochemia Medica 2016;26(2):150–63


163

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy