STAT 311 - Lesson 2
STAT 311 - Lesson 2
Data Collection
STAT 311: Statistical Analysis w/ Software Application
Jayson R. Sarin
Faculty, Mathematics and Science Division
jrsarin@urios.edu.ph
“ Data is a precious thing and will last longer
than the systems themselves”.
-Tim Berners-Lee-
2
Data Collection
3
Consequences from Improperly Collected Data
4
Steps in Data Gathering
5
Source of Data
7
Methods in Collecting Primary Data
Direct personal interviews - The researcher has direct contact with the interviewee. The researcher
gathers information by asking questions to the interviewee.
Indirect/Questionnaire Method - This methods of data collection involve sourcing and accessing
existing data that were originally collected for the purpose of the study.
A focus group is a group interview of approximately six to twelve people who share similar characteristics
or common interests. A facilitator guides the group based on a predetermined set of topics.
Experiment is a method of collecting data where there is direct human intervention on the conditions that
may affect the values of the variable of interest.
Observation is a technique that involves systematically selecting, watching and recoding behaviors of
people or other phenomena and aspects of the setting in which they occur, for the purpose of getting
(gaining) specified information. It includes all methods from simple visual observations to the use of high
level machines and measurements
8
Methods in Collecting Secondary Data
10
Criteria to Determine Appropriate Sample Size
1. Level of Precision
Also called sampling error, the level of precision, is the range in which the true value of
the population is estimated to be.
2. Confidence Interval
It is statistical measure of the number of times out of 100 that results can be expected to
be within a specified range. For example, a confidence interval of 90% means that results
of an action will probably meet expectations 90% of the time.
11
Criteria to Determine Appropriate Sample Size
3. Degree of Variability
Depending upon the target population and attributes under consideration, the degree of
variability varies considerably. The more heterogeneous a population is, the larger the
sample size is required to get an optimum level of precision.
12
Methods in Determining the Sample Size
where:
▻ Z is the z-score corresponding to level of confidence
▻ e is the level of precision.
13
Example : Estimating the Mean or Average
A soft drink machine is regulated so that the amount of drink dispensed is approximately normally distributed
with a standard deviation equal to 0.5 ounce. Determine the sample size needed if we wish to be 95% confident
that our sample mean will be within 0.03 ounce from the true mean.
Solution: The z – score for confidence level 95% in the z – table is 1.96.
14
Estimating Proportion (Infinite Population)
▰ For populations that are large Cochran developed the formula for calculating
sample size when the population is infinite:
where:
▻ is the sample size,
▻ is the selected critical value of desired confidence level
▻ p is the estimated proportion of an attribute that is present in the
population
▻ and is the desired level of precision
15
Example:
Suppose we are doing a study on the inhabitants of a large town, and want to find out how many
households serve breakfast in the mornings. We want 99% confidence and at least 1% precision.
Solution:
We don’t have much information on the subject to begin with, so we’re going to assume that
half of the families serve breakfast: this gives us maximum variability (always assume
maximum variability).
The z – score for confidence level 99% in the z – table is 2.58.
If the problem don’t have a confidence level and level of precision then we always assume
Confidence level is 95%.
The level of precision is 0.05.
16
Finite Population Correction
▰ If the population is small then the sample size can be reduced slightly.
▰ Cochran’s formula for calculating sample size when population size is finite:
where:
is Cochran’s sample size recommendation.
is the population size
is the reduced sample size
17
Example :
▰ Using the problem above, supposed we want to study 20, 000 of the inhabitants.
Solution:
▰ As you can see, this adjustment (called the finite population correction) can
substantially reduce the necessary sample size for small populations.
18
Simplified Formula For Proportions
▰ Slovin’s formula or Yamane’s formula is used to calculate the sample size n given
the population size and error.
▰ According to Yamane, for a 95% confidence level and p . = 0 5 , size of the sample
should be computed as
where:
is the total population.
is the level of precision.
is the sample size
19
Example 1
A researcher plans to conduct a survey about food preference of BS Stat students. If the
population of students is 1000, find the sample size if the error is 5%.
Solution:
20
Example 2:
21
Things to rememer!
▰ We use estimating the mean formula if the problem gives the value of the standard deviation ()
if not mean should be given and solve for s (sample standard deviation).
▰ Cochran’s formula is used for infinite or “large” population meaning no specific value is given
for population, if a target population is given then reduced the sample size (finite population
correction). And if the confidence level is 95% then we can directly used Slovin’s formula.
These two formula coincides at 95% confidence level.
▰ Use Slovin’s formula if the problem directly give the value of N and e. It simply means that we
don’t know anything about the population.
22