Sampling Methods
Sampling Methods
surveys
In this article, we discuss how to calculate the minimum size of a sample that
will be used to calculate a proportion within a given population. Important
decisions are often made based on the proportions obtained through an
internet survey. For example, if the proportion of participants who have an
interest for a new product or service is high enough, investments will be made
to bring it to market.
A proportion is the number of individuals within a population that share a
certain characteristic from a set of possible characteristics. Proportions are
generally calculated as percentages. For example, the proportion of
individuals who are willing to pay for a new product or service.
It is often impossible to calculate the exact proportion of individuals that form a
population. The population as whole might contain thousands, even hundreds
of thousands of individuals. In such a case, we need to calculate the
proportion within a sample of the population. The proportion within the sample
be an adequate substitute for the proportion within the whole population.
The size of the sample used to study a proportion within a population is a
critical factor to obtain reliable results about the proportion in the whole
population. The reliability of the data is never absolute, but is located within an
interval of confidence. The smaller the interval must be, or the lower the error
rate must be, the larger the sample size must be in order to get an accurate
reflection of the proportion within the population as a whole.
The following formula is used to calculate the size of the required sample
n = (z) p ( 1 – p ) / d
2 2
n = sample size
z = level of confidence according to the standard normal distribution (for a
level of confidence of 95%, z = 1.96, for a level of confidence of 99%, z =
2.575)
p = estimated proportion of the population that presents the characteristic
(when unknown we use p = 0.5)
d = tolerated margin of error (for example we want to know the real proportion
within 5%)
Examples
1) To calculate a proportion with a 95% level of confidence and a margin of
error of 5% we obtain
n = (1.96) / 4(0.05) = 384.16
2 2
Conclusion
In order to conduct a reliable market research analysis using an online survey,
it is preferable to conduct the internet survey with at least 400 participants. If
your budget is more limited, the survey can be conducted with 200
participants, but the results will be less accurate. You can use the Interceptum
market research platform to create and deploy online surveys. Interceptum
offers advanced analysis capabilities.
www.slideshare.net/zubis/sample-size-13281869
Sample size
1. 1. SAMPLE SIZE DETERMINATION BY DR ZUBAIR K.O. DEPT OF MEDICAL
MICROBIOLOGY.NHA MBBS(IL),SR II1
2. 2. OUTLINE • Our take home……………. • What is sample size? • What is sample size
determination? • How large a sample do I need? • What are the methods of determining it? •
What are the factors that affect it? • Mind my language • How do you determine it? • How do
you use it? • A final word………………..2
3. 3. OUR TAKE HOME At the end of this presentation, we should be able to; Understand the
significance of sample size. Determine sample size. Understand factors that may affect
sample size Use sample size in our research or study.3
4. 4. WHAT IS SAMPLE SIZE? This is the sub-population to be studied in order to make an
inference to a reference population(A broader population to which the findings from a study
are to be generalized) In census, the sample size is equal to the population size. However,
in research, because of time constraint and budget, a representative sample are normally
used. The larger the sample size the more accurate the findings from a study.4
5. 5. Availability of resources sets the upper limit of the sample size. While the required
accuracy sets the lower limit of sample size Therefore, an optimum sample size is an
essential component of any research.5
6. 6. 6
7. 7. WHAT IS SAMPLE SIZE DETERMINATION Sample size determination is the
mathematical estimation of the number of subjects/units to be included in a study. When a
representative sample is taken from a population, the finding are generalized to the
population. Optimum sample size determination is required for the following reasons: 4. To
allow for appropriate analysis 5. To provide the desired level of accuracy 6. To allow validity
of significance test.7
8. 8. HOW LARGE A SAMPLE DO I NEED? If the sample is too small: 2. Even a well
conducted study may fail to answer it research question 3. It may fail to detect important
effect or associations 4. It may associate this effect or association imprecisely8
9. 9. CONVERSELY If the sample size is too large: 2. The study will be difficult and costly 3.
Time constraint 4. Available cases e.g rare disease. 5. Loss of accuracy. Hence, optimum
sample size must be determined before commencement of a study.9
10. 10. MIND MY LANGUAGE Random error Type I(a) error Systematic error (bias) Type
II (b) error Precision (reliability) Power (1-b) Accuracy (Validity) Effect size Null
hypothesis Design effect Alternative hypothesis10
11. 11. Random error: error that occur by chance. Sources are sample variability, subject to
subject differences & measurement errors. It can be reduce by averaging, increase sample
size, repeating the experiment. Systematic error: deviations not due to chance alone.
Several factors, e.g patient selection criteria may contribute. It can be reduce by good study
design and conduct of the experiment. Precision: the degree to which a variable has the
same value when measured several times. It is a function of random error. Accuracy: the
degree to which a variable actually represent the true value. It is function of systematic
error.11
12. 12. 12
13. 13. Null hypothesis: It state that there is no difference among groups or no association
between the predictor & the outcome variable. This hypothesis need to be tested.
Alternative hypothesis: It contradict the null hypothesis. If the alternative hypothesis cannot
be tested directly, it is accepted by exclusion if the test of significance rejects the null
hypothesis. There are two types; one tail(one-sided) or two tailed(two-sided)13
14. 14. Type I(a) error: It occurs if an investigator rejects a null hypothesis that is actually true
in the population. The probability of making (a) error is called as level of significance &
considered as 0.05(5%). It is specified as Za in sample size computing. Za is a value from
standard normal distribution ≡ a. Sample size is inversely proportional to type I error. Type
II(b) error: it occur if the investigator fails to reject a null hypothesis that is actually false in the
population. It is specify in terms of Zb in sample size computing. Zb is a value from standard
normal distribution ≡b14
15. 15. Power(1-b): This is the probability that the test will correctly identify a significant
difference, effect or association in the sample should one exist in the population. Sample
size is directly proportional to the power of the study. The larger the sample size, the study
will have greater power to detect significance difference, effect or association. Effect size:
is a measure of the strength of the relationship between two variables in a population. It is
the magnitude of the effect under the alternative hypothesis. The bigger the size of the effect
in the population, the easier it will be to find.15
16. 16. Design effect: Geographic clustering is generally used to make the study easier &
cheaper to perform. The effect on the sample size depends on the number of clusters & the
variance between & within the cluster. In practice, this is determined from previous studies
and is expressed as a constant called ‘design effect’ often between 1.0 &2.0. The sample
sizes for simple random samples are multiplied by the design effect to obtain the sample size
for the cluster sample.16
17. 17. odds ratio is a measure of effect size, describing the strength of association or non-
independence between two binary data values. relative risk (RR) is the risk of an event (or
of developing a disease) relative to exposure. Relative risk is a ratio of the probability of the
event occurring in the exposed group versus a non-exposed group.17
18. 18. POWER ANALYSIS When the estimated sample size can not be included in a study,
post-hoc power analysis should be carried out. The probability of correctly rejecting the null
hypothesis is equal to 1 – b, which is called power. The power of a test refers to its ability to
detect what it is looking for. the power of a test is our probability of finding what we are
looking for, given its size. post-hoc power analysis is done after a study has been carried
out to help to explain the results if a study which did not find any significant effects.18
19. 19. AT WHAT STAGE CAN SAMPLE SIZE BE ADDRESSED? It can be addressed at two
stages: 2. Calculate the optimum sample size required during the planning stage, while
designing the study, using appropriate approach & information on some parameters. 3. Or
through post-hoc power analysis at the stage of interpretation of the result.19
20. 20. APPROACH FOR ESTIMATING SAMPLE SIZE/POWER ANALYSIS Approaches for
estimating sample size and performing power analysis depend primarily on: 2. The study
design & 3. The main outcome measure of the study There are distinct approaches for
calculating sample size for different study designs & different outcome measures.20
21. 21. 1. THE STUDY DESIGN There are many different approaches for calculating the
sample size for different study designs. Such as case control design, cohort design, cross
sectional studies, clinical trials, diagnostic test studies etc. Within each study design there
could be more sub-designs and the sample size calculation will vary accordingly.
Therefore, one must use the correct approach for computing the sample size appropriate to
the study design & its subtype.21
22. 22. 2.PRIMARY OUTCOME MEASURE 1⁰ outcome measure is usually reflected in the 1 ⁰
research question of the study & also depend on the study design. For estimating the risk
in control study, it will be the odds ratio, while for cohort study it will be the relative ratio.
For case control study, it could be the difference in means/proportions of exposure in case &
controls, crude/adjusted odds ratio etc. Hence, while calculating sample size, one of these
1⁰outcome measures has to be specified b/c there are distinct approach for calculating the
sample size22
23. 23. statistical inference from the study results In addition, there are also different procedure
for calculating sample size for two approaches of drawing statistical inference from the study
result i.e 2. Estimation (Confidence interval approach) 3. Hypothesis testing(Test of
significance approach) A researcher needs to select the appropriate procedure for
computing the sample size & accordingly use the approach of drawing a statistical inference
subsequently. NB: Test of significance: Chi-squared, T-test, Z-test, F-test, P-23 value
24. 24. ADDITIONAL PARAMETERS Depending upon the approach chosen for calculating the
sample size, one also needs to specify some additional parameters such as; Hypothesis
Precision Type I error Type II error Power Effect size Design effect24
25. 25. PROCEDURE FOR CALCULATING SAMPLE SIZE. There are four procedures that
could be used for calculating sample size: 2. Use of formulae 3. Ready made table 4.
Nomograms 5. Computer software25
26. 26. USE OF FORMULAE FOR SAMPLE SIZE CALCULATION & POWER ANALYSIS
There are many formulae for calculating sample size & power in different situations for
different study designs. The appropriate sample size for population-based study is
determined largely by 3 factors 3. The estimated prevalence of the variable of interest. 4.
The desired level of confidence. 5. The acceptable margin of error.26
27. 27. To calculate the minimum sample size required for accuracy, in estimating proportions,
the following decisions must be taken: 2. Decide on a reasonable estimate of key proportions
(p) to be measured in the study 3. Decide on the degree of accuracy (d) that is desired in the
study. ~1%-5% or 0.01 and 0.05 4. Decide on the confidence level(Z) you want to use.
Usually 95%≡1.96. 5. Determine the size (N) of the population that the sample is supposed
to represent. 6. Decide on the minimum differences you expect to find statistical
significance.27
28. 28. ( occasionally at 2.0)28
29. 29. E.g if the proportion of a target population with certain characteristics is 0.50, Z
statistics is 1.96 & we desire accuracy at 0.05 level, then the sample size is n=(1.962)(0.5)
(0.5)/0.052 n=384.29
30. 30. If study population is < 10,000 nf=n/1+(n)/(N) nf= desired sample size, when study
population <10,000 n= desired sample size, when the study population > 10,000 N= estimate
of the population size Example, if n were found to be 400 and if the population size were
estimated at 1000, then nf will be calculated as follows nf= 400/1+400/1000 nf= 400/1.4
nf=28630
31. 31. SAMPLE SIZE FORMULA FOR COMPARISON OF GROUPS If we wish to test
difference(d) between two sub-samples regarding a proportion & can assume an equal
number of cases(n1=n2=n’) in two sub- samples, the formula for n’ is n’=2z2pq/d2 E.g
suppose we want to compare an experimental group against a control group with regards to
women using contraception. If we expect p to be 40 & wish to conclude that an observed
difference of 0.10 or more is significant at the 0.05 level, the sample size will be: n’=
2(1.96)2(0.4)(0.6)/0.12 =184 Thus, 184 experimental subject & another 184 control subjects
are required.31
32. 32. USE OF READYMADE TABLE FOR SAMPLE SIZE CALCULATION How large a
sample of patients should be followed up if an investigator wishes to estimate the incidence
rate of a disease to within 10% of it’s true value with 95% confidence? The table show that
for e=0.10 & confidence level of 95%, a sample size of 385 would be needed. This table
can be used to calculate the sample size making the desired changes in the relative
precision & confidence level .e.g if the level of confidence is reduce to 90%, then the sample
size would be 271. Such table that give ready made sample sizes are available for different
designs & situation32
33. 33. 33
34. 34. USE OF NOMOGRAM FOR SAMPLE SIZE CALCULATION For use of nomogram to
calculate the sample size, one needs to specify the study(group 1) & the control group(group
2). It could be arbitrary or based on study design; the nomogram will work either way. The
researcher should then decide the effect size that is clinically important to detect. This should
be expressed in terms of % change in the response rate compared with that of the control
group.34
35. 35. E.g if 40% of patients treated with standard therapy are cured and one wants to know
whether a new drug can cure 50%, one is looking for a 25% increase in cure rate . (50%-
40%/40% = 25% )35
36. 36. 36
37. 37. USE OF COMPUTER SOFTWARE FOR SAMPLE SIZE CALCULATION & POWER
ANALYSIS The following software can be used for calculating sample size & power; Epi-
info nQuerry Power & precision Sample STATA SPSS37
38. 38. Epi-info for sample size determination In STATCALC: 1 Select SAMPLE SIZE &
POWER. 2 Select POPULATION SURVEY. 3 Enter the size of population (e.g. 15 000).
4 Enter the expected frequency (an estimate of the true prevalence, e.g.80% ± your
minimum standard). 5 Enter the worst acceptable result (e.g. 75%) i.e the margin of error is
5%38
39. 39. How to use sample size formulae Steps: 1st Formulate a research question 2nd Select
appropriate study design, primary outcome measure, statistical significance. 3rd use the
appropriate formula to calculate the sample size.39
40. 40. Finally Sample size determination is one of the most essential component of every
research/study. The larger the sample size, the higher the degree accuracy, but this is limit
by the availability of resources. It can be determined using formulae, readymade table,
nomogram or computer software.40
41. 41. STILL CONFUSED……………………….. Smart people don’t do it alone…………………
Call a statistician •Sample selection41 •Sample size determination •Analysis of data
42. 42. 42
43. 43. References Research methodology, 2004, M.O. Araoye; sample size determination,
page 117 Research methodology, 2004,Zodpey SP ijvl.com Wikipedia, sample size
determination43