0% found this document useful (0 votes)
10 views83 pages

Sample Size, Sampling, Stat Analysis

Uploaded by

syed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views83 pages

Sample Size, Sampling, Stat Analysis

Uploaded by

syed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

E- Workshop

Thesis Writing
Biostatistical Considerations

• Sample Size Calculation

• Sampling Method

• Data Entry

• Data Analysis
E- Workshop

Sample Size
Calculation
for

Biomedical Research
• Research Question
• What is the Prevalence of HIV in India?
How to Cook Rice

Courtesy: village cooking / Alamy Stock Photo,


hreewhistleskitchen.com/how-to-cook-indian-rice-perfectly/
Sampling
Research Question
What is the Prevalence of HIV in India

• Researcher 1 Studies 2 Subjects and one of them was


HIV +ve
• Results: 50% Prevalence of HIV in India

• Researcher 2 Studies 140 Crore Population


• Results: 01% Prevalence of HIV in India
Confusions
• The HIV prevalence among adults in India, is estimated to be 0.22%
(0.16–0.30%). (Rajneesh Kumar Joshi et al (2019) A study among 23478 Adults)

• Should I take 23478 subjects or More??


Misconception

Magic Number

There is no such thing as a magic number when it comes to sample size


calculations and arbitrary numbers such as 30 must not be considered as
adequate. (Sitanshu Sekhar Kar et al.2013)

• It's not that "30 in a sample group should be enough" for a study. It's that you need at least 30 before you can reasonably expect
an analysis based upon the normal distribution (i.e. z test) to be valid. (https://towardsdatascience.com/is-n-30-really-enough-a-popular-inductive-fallacy-among-
dataanalysts95661669dd98#:~:text=%E2%80%9CA%20minimum%20of%2030%20observations,to%20trust%20your%20confidence%20interval.)
Why Sample Size Calculation?

Large Sample:

• Cost
• Time and
• Personnel
Why Sample Size Calculation? (contd.)

Small Sample:

Unable to detect clinically important


results.
Clinical Significance Vs Statistical Significance

A possible antipyretic is tested in patients


with the common cold.
500 receive the candidate drug
500 receive a placebo control
Temperatures measured 4 hours after dosing
Statistical Significance

• P Value
• How likely your results are due to chance

• If p ≤ 0.05 : Results are statistically significant


• p ≥ 0.05 : Results are not statistically significant
A possible antipyretic is tested in patients with the common cold.

N Mean StDev SE Mean

Drug 500 39.950 0.653 0.029

Control 500 40.058 0.699 0.031

p value = 0.011

Statistical Significance: Yes, There is a reduction in the temperature


Clinical Significance: NO. Temperature only fell by about 0.1c

Because the sample size is so large we are able to detect a


very small change in temperature
• Sample should be :

Not very small and not very large

Statistical Significance Clinical Significance


Your sample size is

▪ Not based on your reference article’s sample size

▪ Not based on your senior batch mate’s sample size

▪ Not based on a rounded off figure such as 100

▪ Not a magical Number such as 30


Your sample size is based on

Your expected results


Expected prevalence/Incidence

Expected difference between the two


treatment groups

Expected sensitivity of a diagnostic


test in comparison with a gold
standard test
Who gives us the expected value

From the results of reference article

From a pilot study results

Assuming the proportion as 50%


(Usually for prevalence/ Incidence studies which are done on the
field)
Types of Medical Research

Estimation: (Prevalence/Descriptive Study)


- Given proportion or Prevalence
- Given mean and standard deviation

Testing Hypothesis:(Cohort/Case Control/Clinical


Trial)
- Given two proportion or incidence rates
- Given two group means and standard
deviations
Type I error and Type II error

Reality
True False
True Correct Type II error (or)
Researcher’s Beta error
decision False Type I error Correct
(or) Alpha Power = (1-beta)
error

RQ: Does BCG Protects from TB (Is effective)?


Null Hypothesis: ?
Power of the Study

Power: Complement of Beta error.

The probability that if a true difference of


stated magnitude existed then the study
would have picked it up as statistically
significant.
Estimation

Formulae & Problems


Descriptive study

A: when proportion is the parameter of our study

Z2  p  q
n =
d2
Where
Z= Standardized Normal deviate (Z value)
p = Proportion or Prevalence of interest.
q = 100 - p

d = Clinically expected variation.


Clinically expected variation/Precision

Scenario 01: Precision of 5%

20%

15% 25%

Scenario 02: Precision of 5%


3%

-2% 8%
Example
From a pilot study it was reported that among Male IT
professionals 28% preferred to have male contraception
methods. It was decided to have 95% C.I and 10%
variability in the estimated 28%. How many subjects are
necessary to conduct the study.

P = 28%
q = 72%
Z= 1.96 for  at 0.05
d = 10% of 28% = 2.8
(1.96)2  28  72
n = 2
= 987.8
(2.8)
Practical Examples

• Study 1:
• Estimating the Prevalence of Obesity among doctors- A
Cross sectional Study

• Study 2:
• The burden of Hepatitis B infection in India
B: when mean is the parameter of our study

Z2  S 2
n =
d2
Where
Z= Standardized Normal deviate (Z value)
S = Sample standard deviation
d = Clinically expected variation
Example
In a Health Survey of schoolchildren it is found that the mean BMI
of 55 boys is 22.3 with a standard deviation of 2.1.
Consider the precision as 0.8.
Mean = 22.3
Standard = 2.1
Z= 1.96 for  at 0.05

d= 0.8
(1.96)2  2.12
n = 2
= 26
(0.8)
Testing Hypothesis

Formulae
&
Problems
Analytical study
A: when proportion is the parameter of our study

(Z+ Z) 2 pq2


n =
d 2
Where,
Z = Z value for  level Z = Z
value for  level
p = average percentage between two groups q = 100 -
p
d = Clinically meaningful difference between two groups.
Example (Aspirin)
Outcome
Cure Not Total
cured
High Dose 4 (16.7) 20 24

Low Dose 1 (4.2) 23 24

Total 5 (10.4) 43 48

P = (16.7+4.2)/2=10.4% q
= 89.6%
Z= 1.96 for  at 0.05
Z = 1.282 for  at 0.10

(1.96 + 1.282)2 10.4  89.6  2


n = 2
=125 in each arm
(12.5)
B: when mean is the parameter of our
study
(Z  + Z )2  2
n = 

Where
Z = Z value for  error
Z = Z value for  error
S = Common standard deviation between two
groups
d = Clinically meaningful difference
Example
Duration of hospital stay (in hours)
Low Dose High Dose
Mean 61.7 33.2
SD 93.8 89.1
n 23 20
= 0.05 = 0.10

(1.96 + 1.282 )2  2 * 902


n = = 210
28.52

= 0.05 = 0.20

(1.96 + 0.842)2  2  902


n = = 157
28.52
Comparing mean- Pre and Post

n = 2+ (Z  + Z  ) 2
 S 2

d2
Where
Z = Z value for  error
Z = Z value for  error
S = Common standard deviation between two
groups
d = Clinically meaningful difference
Type I (Alpha) Error
Significance level
Z
0.05 1.96
0.01 2.57
0.001 3.29

Type II (Beta) Error


Significance level
Z

0.10 1.282
0.15 1.037
0.20 0.842
0.25 0.675
How to calculate

1. Do a thorough literature search , get the valid


reference and calculate “n”

2. Do a pilot study.
Calculate the sample size based on the results of the
pilot study

3.Assume the percentage as 50%(Usually done for


Prevalence studies. )
• Sample size Calculation- How to write it

1. Assuming the prevalence of HIV in South India as


1%, with a Confidence interval of 95% and at an
alpha level of 5%, a sample size of 15 subjects
need to be studied.
• Sample size Calculation- How towrite it

2. Assuming the at least a difference of 28.5 hours


between the experimental and control group,
with a power of 80%, an alpha level of 5%, with a
confidence interval of 95% a sample size of 210
subjects in each arm needs to taken.
Sample size Calculation
How to write it in your Research Paper
Assuming the prevalence of HIV in South India as 1%(as reported by
the pilot study in 2019) with a confidence interval of 95% and at an
alpha level of 5%, a sample size of 15 subjects need to be studied.

Taking the prevalence of hypertension among doctors as


35.6%(Ramachandran A et al) and with an absolute precision of 6%, at
5% significance level the sample size was estimated to be 245.
Sampling

Probability sampling Non probability sampling

1. Simple random sampling 1.Convenience sampling


2. Systematic random sampling 2.Quota sampling
3. Stratified random sampling 3.Purposive sampling
4. Cluster sampling 4. Snow ball sampling

Dept of Community Medicine


PROBABILITY SAMPLING TECHNIQUES
Probability sampling

Simple Random Sampling (SRS):

• Most commonly used sampling method.


• Each sampling unit has an equal chance of getting
selected into the sample.
• Can be done using Lottery method or Random number
table method.
• The chance of bias(error) is minimum.

Dept of Community Medicine


Simple Random Sampling:
Probability sampling..

Systematic Random Sampling:

• First unit is selected at random.


• Then there is an equal interval between each
sampling units (Sampling interval).
• Precision is less compared to SRS.

Dept of Community Medicine


Systematic Random Sampling:

2 5 8 11 14 17 20 23 26 29

Sampling interval: 30/10= 3


Probability sampling..

Stratified Random Sampling:

• Used when the population is heterogeneous.


• Population is divided into homogenous subgroups called
Strata.
• From each stratum individuals are selected using SRS or
systematic sampling.
• Good when the population is heterogeneous.

• Eg: Heights of students in a School.


Dept of Community Medicine
Stratified Random Sampling:

POPULATION

STRATA

SIMPLE RANDOM SAMPLING

SAMPLE
Probability sampling..

Cluster sampling:

• Used to cover a study individuals that is spread across a


large population.(Country, State…)
• Population is divided into clusters.
• Clusters are selected using SRS into the study.
• Complete enumeration of all the units in the selected
clusters are done.
• Useful for large populations.

Dept of Community Medicine


Cluster Sampling:
Multistage Sampling
MYSURU HASSAN

ALL THE PHC IN THE FIRST STAGE


DISTRICT Cluster Sampling

PHC’s SELECTED FROM SRS SECOND STAGE


Simple Random Sampling

ASHA WORKERS IN THIRD STAGE


SELECTED PHC’s Systematic Random Sampling
NON-PROBABILITY SAMPLING
TECHNIQUES
Quota Sampling:

POPULATION

STRATA

NON PROBABILITY SAMPLING

SAMPLE
Snowball Sampling:
Convenient Sampling:

CLASS ROOM
TEACHER
Data Analysis
Data

• Numerical facts that are collected in order for analysis or further studies.

IP number Name Age Gender


1234 Raj 23 M
8723 Jay 38 M
5522 Swathy 19 F
Data

Quantitative Qualitative

Discrete Nominal Ordinal


Continuous
• Quantitative data
Can be measured using numbers.

• Continuous data
Can be measured precisely(may be using an instrument)
Have units attached to them
Eg: Height, Weight…

• Discrete data
Countable datas(Always whole numbers)
Eg: Number of children, Number of patients
• Qualitative data
Can not be measured

• Ordinal data: Has some order


Eg: Cancer stages, Socio-economic status..

• Nominal data: Does not have any order


Eg: Gender, Blood group..
Class Number Weight Grades Gender
obtained
34 45 A Male

36 60 B Female

35 59 B Male

40 67 C Male

Grade A= >80 % marks in exam


Grade B= 60- 80% marks in exam
Grade C= <60% marks in exam
Data Entry
How to Enter the data for Unbiased analysis
• Data of Heights of 100 college students

161 171 171 155 150 155

145 149 149 154 168 163

171 155 155 177 171 155

149 154 154 176 149 154

155 171 171 144 155 171

154 162 145 166 134 167


Data Analysis
Summarization of data

Summarization

Measures of Central Tendency Measures of Dispersion

Mean Range

Median Mean Deviation

Mode Standard Deviation

IQR(Inter Quartile Range)


n

x
x1 + x2 + ... + xn i =1 i
x= =
n n

 (x )
2
i − xi
Standard deviation = S =
n
When to use Arithmetic mean and SD

Negative Normal Positive


Skew Distribution Skew
Checking the NORMALITY

Non-smoking mothers
6

1 Std. Dev = .37


Heavy smoking mothers
Mean = 3.59
4.0
0 N = 15.00
2.75 3.00 3.25 3.50 3.75 4.00 4.25

Birth weight of the baby 3.0

2.0

1.0

Std. Dev = .49


Mean = 3.20
0.0 N = 14.00
2.25 2.50 2.75 3.00 3.25 3.50 3.75

Birth weight of the baby


The distribution of data

Normal data:
SD < ½ mean

Skewed / Non-normal data:


SD > ½ mean

Note: Applicable only for variables where negative values are impossible
(e.g., Rate of GFR change)

Ref: Altman DG, 1991


Statistical tests

Parametric Methods Non Parametric Methods

Independent t test
Man Whitney U test

Paired t test
Wilcoxon Signed rank

ANOVA Kruskal Wallis test


Example data ( Independent Groups)
A study was conducted to compare the birth weights of children born to 15
non-smoking with those of children born to 14 heavy smoking mothers.

Non-smoking Mothers Heavy smoking Mothers


(n = 15) (n = 14)
3.99 3.18

3.79 2.84

3.60 2.90

3.73 3.27

3.21 3.85

3.60 3.52

4.08 3.23

3.61 2.76
3.83 3.60

3.31 3.75

4.13 3.59

3.26 3.63

3.54 2.38

3.51 2.34

2.71
Paired Observations

A study was carried to evaluate the effect of the new diet on weight
loss. The study population consist of 12 people have used the diet for
2 months; their weights before and after the diet are given below.
Weight (Kgs)
Patient No.
Before Diet After Diet
1 75 70
2 60 54
3 68 58
4 98 93
5 83 78
6 89 84
7 65 60
8 78 77
9 95 90
10 80 76
11 100 94
12 108 100

The research question asks whether the diet makes a difference?


Example data (Independent groups 03 or More)
A study was conducted to assess the hb levels of women in low, medium
and high socio economic status

SL Low Medium High


No (n = 20) (n = 18) (n = 12)
1 8.10 8.40 12.70
2 8.00 11.10 11.80
3 6.90 10.80 13.10
4 11.40 11.00 12.30
5 10.70 12.20 10.90
6 10.20 8.70 12.60
7 8.90 12.30 13.20
8 9.90 11.50 14.20
9 6.80 11.60 11.80
10 6.10 12.90 12.40
Or it can be zero / close to zero

r = +0.03
Correlation
• Linear relationship between two variables
• Generally we are concerned with 2 numerical variables
• For example
• Waist circumference and BMI
• Both quantitative
Chi- Square Test
When is Chi Square test Used

• Used to test categorical Data

• Nominal Variables
Eg. Gender, Blood Group

• Ordinal Variables
Birth Order
Severity of disease
(absent, mild, moderate, severe)
Chi Square Test

Null hypothesis : There is no association between two variables


Alternative hypothesis : There is association between two variables

Example:
TB

Yes No Total

Yes 24 31 55
(43.6%) (56.4%)
HIV

No 36 113 149
(24.2%) (75.8%)
Select a Statistical Test
Type of Data Measurement from Measurement from
Normal Population Non-normal Population

Describe one group

Compare two
independent groups

Compare two paired


groups
Select a Statistical Test
Type of Data Measurement from Measurement from Non-
Normal Population normal Population

Describe one group Mean, SD Median, Interquartile


Range
Compare two Independent sample t- Mann-Whitney U test (or)
independent groups test (unpaired t test) Wilcoxon Rank Sum test

Compare two paired Paired t test Wilcoxon Signed rank test


groups
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy