Power Analysis
Power Analysis
Anne Segonds-Pichon
v2020-09
Question
Results
Experimental design
Data Analysis
Choice of statistical tests
Data Exploration
Sample Size
Data Collection/Storage
Experiment
Sample Size: Power Analysis
• Definition of power: probability that a statistical test will reject a false null hypothesis (H0).
• Translation: the probability of detecting an effect, given that the effect is really there.
• In a nutshell: the bigger the experiment (big sample size), the bigger the power (more likely
to pick up a difference).
• Main output of a power analysis:
• Estimation of an appropriate sample size
• Too big: waste of resources,
• Too small: may miss the effect (p>0.05)+ waste of resources,
• Grants: justification of sample size,
• Publications: reviewers ask for power calculation evidence,
• Home office: the 3 Rs: Replacement, Reduction and Refinement.
What does Power look like?
What does Power look like? Null and alternative hypotheses
Control Treatment
Area = 1
60 60 60 60 60 60
Quantitative variable
Quantitative variable
Quantitative variable
Quantitative variable
Quantitative variable
Quantitative variable
50 50 50 50 50 50
40 40 40 40 40 40
30 30 30 30 30 30
20 20 20 20 20 20
10 10 10 10 10 10
0 0 0 0 0 0
Sample 1 Sample 2 Sample 1 Sample 2 Sample 1 Sample 2 Sample 1 Sample 2 Sample 1 Sample 2 Sample 1 Sample 2
T Distribution
0.95
0.025 0.025
t(14)
t=-2.1448 t=2.1448
• In hypothesis testing:
• test statistic is compared to the critical value to determine significance
• Example of test statistic: t-value
• If test statistic > critical value: statistical significance and rejection of the null hypothesis
• Example: t-value > critical t-value
To recapitulate:
• The null hypothesis (H0): H0 = no effect
• The aim of a statistical test is to reject or not H0.
Statistical decision True state of H0
H0 True (no effect) H0 False (effect)
Reject H0 Type I error α Correct
False Positive True Positive
Do not reject H0 Correct Type II error β
True Negative False Negative
https://github.com/allisonhorst/stats-illustrations#other-stats-artwork
Sample Size: Power Analysis
• the larger the effect size, the smaller the experiment will need to be to detect it.
• How to determine it?
• Previous research, pilot study …
• Jacob Cohen
• Defined small, medium and large effects for different tests
The effect size: how is it calculated?
The absolute difference
• It depends on the type of difference and the data
• Easy example: comparison between 2 means
Absolute difference
• The bigger the effect (the absolute difference), the bigger the power
= the bigger the probability of picking up the difference
http://rpsychologist.com/d3/cohend/
The effect size: how is it calculated?
The standard deviation
• The bigger the variability of the data, the smaller the power
critical value
H0 H1
Power Analysis
The power analysis depends on the relationship between 6 variables:
• In reality it is difficult to reduce the variability in data, or the contrast between means,
• most effective way of improving power:
• increase the sample size.
The sample size 2
3
n=3
1
Sample means
2
0
1 -1
Continuous variable
n=30
-1 1
Sample means
0
-2
-1
-3
Population
-2
Sample
The sample size
2
2
1 1
Sample means
Sample means
0 0
-1 -1
-2
-2
The sample size
The sample size: the bigger the better?
• It takes huge samples to detect tiny differences but tiny samples to detect huge differences.
• Is the question:
• Is the there a difference?
• Is it bigger than or smaller than?
• Free packages:
• R
• G*Power
• InVivoStat
Exercise 1
power.prop.test(n=NULL, p1=NULL, p2=NULL,
sig.level=NULL, power=NULL, alternative=c("two.sided", "one.sided"))
Exercise 2
power.t.test(n=NULL, delta=NULL, sd=1, sig.level=NULL, power=NULL,
type=c("two.sample", "one.sample", "paired"),
alternative=c("two.sided", "one.sided"))
Exercise 1:
• Scientists have come up with a solution that will reduce the number of lions being shot by farmers in Africa:
painting eyes on cows’ bottoms.
• Early trials suggest that lions are less likely to attack livestock when they think they’re being watched
• Fewer livestock attacks could help farmers and lions co-exist more peacefully.
• Pilot study over 6 weeks:
• 3 out of 39 unpainted cows were killed by lions, none of the 23 painted cows from the same herd were killed.
• Questions:
• Do you think the observed effect is meaningful to the extent that such a ‘treatment’ should be applied?
Consider ethics, economics, conservation …
• Run a power calculation to find out how many cows should be included in the study.
• Clue 1: power.prop.test()
• Clue 2: exactly one of the parameters must be passed as NULL, and that parameter is determined from the others.
http://www.sciencealert.com/scientists-are-painting-eyes-on-cows-butts-to-stop-lions-getting-shot
Exercise 1: Answer
• Scientists have come up with a solution that will reduce the number of lions being shot by farmers in Africa:
• Painting eyes on the butts of cows
• Early trials suggest that lions are less likely to attack livestock when they think they’re being watched
• Less livestock attacks could help farmers and lions co-exist more peacefully.
• To reach significance with a t-test, providing the preliminary results are to be trusted,
and be confident in a difference between the 2 groups, we need about 10 arachnophobes in each group.
Unequal sample sizes
• Scientists often deal with unequal sample sizes
• No simple trade-off:
• if one needs 2 groups of 30, going for 20 and 40 will be associated with decreased power.
• Unbalanced design = bigger total sample
• Solution:
Step 1: power calculation for equal sample size
Step 2: adjustment
• Cow example: balanced design: n = 97
but this time: unpainted group: 2 times bigger than painted one (k=2):
• Using the formula, we get a total:
N=2*97*(1+2)2/4*2 = 219
Painted butts (n1)=73 Unpainted butts (n2)=146
• Non-parametric tests never require more than 15% additional subjects providing that the
distribution is not too unusual.
• Not significant p-value (>0.05): no effect! Wait: how big was the sample?
• Big enough = enough power: no effect means no effect
• Not big enough = not enough power
• Possible meaningful difference but we miss it