Sample Size, Sampling, Stat Analysis
Sample Size, Sampling, Stat Analysis
Thesis Writing
Biostatistical Considerations
• Sampling Method
• Data Entry
• Data Analysis
E- Workshop
Sample Size
Calculation
for
Biomedical Research
• Research Question
• What is the Prevalence of HIV in India?
How to Cook Rice
Magic Number
• It's not that "30 in a sample group should be enough" for a study. It's that you need at least 30 before you can reasonably expect
an analysis based upon the normal distribution (i.e. z test) to be valid. (https://towardsdatascience.com/is-n-30-really-enough-a-popular-inductive-fallacy-among-
dataanalysts95661669dd98#:~:text=%E2%80%9CA%20minimum%20of%2030%20observations,to%20trust%20your%20confidence%20interval.)
Why Sample Size Calculation?
Large Sample:
• Cost
• Time and
• Personnel
Why Sample Size Calculation? (contd.)
Small Sample:
• P Value
• How likely your results are due to chance
p value = 0.011
Reality
True False
True Correct Type II error (or)
Researcher’s Beta error
decision False Type I error Correct
(or) Alpha Power = (1-beta)
error
Z2 p q
n =
d2
Where
Z= Standardized Normal deviate (Z value)
p = Proportion or Prevalence of interest.
q = 100 - p
20%
15% 25%
-2% 8%
Example
From a pilot study it was reported that among Male IT
professionals 28% preferred to have male contraception
methods. It was decided to have 95% C.I and 10%
variability in the estimated 28%. How many subjects are
necessary to conduct the study.
P = 28%
q = 72%
Z= 1.96 for at 0.05
d = 10% of 28% = 2.8
(1.96)2 28 72
n = 2
= 987.8
(2.8)
Practical Examples
• Study 1:
• Estimating the Prevalence of Obesity among doctors- A
Cross sectional Study
• Study 2:
• The burden of Hepatitis B infection in India
B: when mean is the parameter of our study
Z2 S 2
n =
d2
Where
Z= Standardized Normal deviate (Z value)
S = Sample standard deviation
d = Clinically expected variation
Example
In a Health Survey of schoolchildren it is found that the mean BMI
of 55 boys is 22.3 with a standard deviation of 2.1.
Consider the precision as 0.8.
Mean = 22.3
Standard = 2.1
Z= 1.96 for at 0.05
d= 0.8
(1.96)2 2.12
n = 2
= 26
(0.8)
Testing Hypothesis
Formulae
&
Problems
Analytical study
A: when proportion is the parameter of our study
Total 5 (10.4) 43 48
P = (16.7+4.2)/2=10.4% q
= 89.6%
Z= 1.96 for at 0.05
Z = 1.282 for at 0.10
Where
Z = Z value for error
Z = Z value for error
S = Common standard deviation between two
groups
d = Clinically meaningful difference
Example
Duration of hospital stay (in hours)
Low Dose High Dose
Mean 61.7 33.2
SD 93.8 89.1
n 23 20
= 0.05 = 0.10
= 0.05 = 0.20
n = 2+ (Z + Z ) 2
S 2
d2
Where
Z = Z value for error
Z = Z value for error
S = Common standard deviation between two
groups
d = Clinically meaningful difference
Type I (Alpha) Error
Significance level
Z
0.05 1.96
0.01 2.57
0.001 3.29
0.10 1.282
0.15 1.037
0.20 0.842
0.25 0.675
How to calculate
2. Do a pilot study.
Calculate the sample size based on the results of the
pilot study
2 5 8 11 14 17 20 23 26 29
POPULATION
STRATA
SAMPLE
Probability sampling..
Cluster sampling:
POPULATION
STRATA
SAMPLE
Snowball Sampling:
Convenient Sampling:
CLASS ROOM
TEACHER
Data Analysis
Data
• Numerical facts that are collected in order for analysis or further studies.
Quantitative Qualitative
• Continuous data
Can be measured precisely(may be using an instrument)
Have units attached to them
Eg: Height, Weight…
• Discrete data
Countable datas(Always whole numbers)
Eg: Number of children, Number of patients
• Qualitative data
Can not be measured
36 60 B Female
35 59 B Male
40 67 C Male
Summarization
Mean Range
x
x1 + x2 + ... + xn i =1 i
x= =
n n
(x )
2
i − xi
Standard deviation = S =
n
When to use Arithmetic mean and SD
Non-smoking mothers
6
2.0
1.0
Normal data:
SD < ½ mean
Note: Applicable only for variables where negative values are impossible
(e.g., Rate of GFR change)
Independent t test
Man Whitney U test
Paired t test
Wilcoxon Signed rank
3.79 2.84
3.60 2.90
3.73 3.27
3.21 3.85
3.60 3.52
4.08 3.23
3.61 2.76
3.83 3.60
3.31 3.75
4.13 3.59
3.26 3.63
3.54 2.38
3.51 2.34
2.71
Paired Observations
A study was carried to evaluate the effect of the new diet on weight
loss. The study population consist of 12 people have used the diet for
2 months; their weights before and after the diet are given below.
Weight (Kgs)
Patient No.
Before Diet After Diet
1 75 70
2 60 54
3 68 58
4 98 93
5 83 78
6 89 84
7 65 60
8 78 77
9 95 90
10 80 76
11 100 94
12 108 100
r = +0.03
Correlation
• Linear relationship between two variables
• Generally we are concerned with 2 numerical variables
• For example
• Waist circumference and BMI
• Both quantitative
Chi- Square Test
When is Chi Square test Used
• Nominal Variables
Eg. Gender, Blood Group
• Ordinal Variables
Birth Order
Severity of disease
(absent, mild, moderate, severe)
Chi Square Test
Example:
TB
Yes No Total
Yes 24 31 55
(43.6%) (56.4%)
HIV
No 36 113 149
(24.2%) (75.8%)
Select a Statistical Test
Type of Data Measurement from Measurement from
Normal Population Non-normal Population
Compare two
independent groups