0% found this document useful (0 votes)
350 views22 pages

STAT 311 - Lesson 2

1) Data collection is the process of systematically gathering and measuring information about variables of interest to answer research questions and test hypotheses. Proper planning is needed to collect relevant and useful data. 2) Improper data collection can lead to inability to answer research questions accurately, inability to repeat studies, and misleading findings. 3) Key steps in data collection include setting objectives, determining needed data, designing collection methods and forms, collecting data, and ensuring data quality. Primary and secondary sources provide first-hand and analyzed information respectively.

Uploaded by

MUAJIER MINGA-AS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
350 views22 pages

STAT 311 - Lesson 2

1) Data collection is the process of systematically gathering and measuring information about variables of interest to answer research questions and test hypotheses. Proper planning is needed to collect relevant and useful data. 2) Improper data collection can lead to inability to answer research questions accurately, inability to repeat studies, and misleading findings. 3) Key steps in data collection include setting objectives, determining needed data, designing collection methods and forms, collecting data, and ensuring data quality. Primary and secondary sources provide first-hand and analyzed information respectively.

Uploaded by

MUAJIER MINGA-AS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Father Saturnino Urios University

Arts and Sciences Program

Data Collection
STAT 311: Statistical Analysis w/ Software Application

Jayson R. Sarin
Faculty, Mathematics and Science Division
jrsarin@urios.edu.ph
 
“ Data is a precious thing and will last longer
than the systems themselves”.

-Tim Berners-Lee-

2
Data Collection

▰ Data collection is the process of gathering and measuring information on variables


of interest, in an established systematic fashion that enables one to answer stated
research questions, test hypotheses, and evaluate outcomes.
▰ Without proper planning for data collection, a number of problems can occur. If the
data collection steps and processes are not properly planned, the research project
can ultimately end up with a data set that does not serve the purpose for which it
was intended.

3
  
Consequences from Improperly Collected Data

▰ Inability to answer research questions accurately.


▰ Inability to repeat and validate the study.
▰ Distorted findings resulting in wasted resources.
▰ Misleading other researchers to pursue fruitless avenues of investigation.
▰ Compromising decisions for public policy.
▰ Causing harm to human participants and animal subjects.

4
Steps in Data Gathering

 Set the objectives for collecting data


 Determine the data needed based on the set objectives.
 Determine the method to be used in data gathering and define the comprehensive
data collection points.
 Design data gathering forms to be used.
 Collect the data.
So the challenge is to find ways, which lead to information that is cost-effective, relevant, timely
and important for immediate use. Some methods pay attention to timeliness and reduction in cost.
Others pay attention to accuracy and the strength of the method in using scientific.

5
Source of Data

Whether conducting research in the social sciences, humanities arts, or natural


sciences, the ability to distinguish between primary and secondary sources is essential.
▰ Primary Sources - Provide a first-hand account of an event or time period and are considered
to be authoritative. They represent original thinking, reports on discoveries or events, or they
can share new information. Often these sources are created at the time the events occurred but
they can also include sources that are created later. They are usually the first formal
appearance of original research.
▰ Primary Data - are data documented by the primary source. The data collectors documented
the data themselves.
 The first hand information obtained by the investigator is more reliable and accurate
since the investigator can extract the correct information by removing doubts, if any, in
the minds of the respondents regarding certain questions.
6
▰ Secondary Sources - offer an analysis, interpretation or a restatement of primary
sources and are considered to be p e r s u a s i v e . They often involve
generalization, synthesis, interpretation, commentary or evaluation in an attempt
to convince the reader of the creator's argument. They often attempt to describe or
explain primary sources.
▰ Secondary Data - are data documented by a secondary source. The data collectors
had the data documented by other sources.
 In secondary data, data are primary data for the agency that collected them, and become
secondary for someone else who uses these data for his own purposes.
 Secondary data are less expensive to collect both in money and time.

7
Methods in Collecting Primary Data

 Direct personal interviews - The researcher has direct contact with the interviewee. The researcher
gathers information by asking questions to the interviewee.
 Indirect/Questionnaire Method - This methods of data collection involve sourcing and accessing
existing data that were originally collected for the purpose of the study.
 A focus group is a group interview of approximately six to twelve people who share similar characteristics
or common interests. A facilitator guides the group based on a predetermined set of topics.
 Experiment is a method of collecting data where there is direct human intervention on the conditions that
may affect the values of the variable of interest.
 Observation is a technique that involves systematically selecting, watching and recoding behaviors of
people or other phenomena and aspects of the setting in which they occur, for the purpose of getting
(gaining) specified information. It includes all methods from simple visual observations to the use of high
level machines and measurements

8
Methods in Collecting Secondary Data

 Published report on newspaper and periodicals.


 Financial Data reported in annual reports.
 Records maintained by the institution.
 Internal reports of the government departments.
 Information from official publications.
Take note!
▰ Always investigate the validity and reliability of the data by examining the collection method
employed by your source.
▰ Do not use inappropriate data for your research.
▰ The choice of methods of data collection is largely based on the accuracy of the information
they yield.
9
Sample Size

“How many participants should be chosen for a survey”?


▰ One of the most frequent problems in statistical analysis is the determination of the
appropriate sample size. One may ask why sample size is so important.
 The answer to this is that an appropriate sample size is required for validity.
 If the sample size it too small, it will not yield valid results. An appropriate sample size
can produce accuracy of results.
 A sample size that is too large will result in wasting money and time because enough
sample will normally give an accurate result.
 The sample size is typically denoted by n and it is always a positive integer. No exact
sample size can be mentioned here and it can vary in different research settings.

10
Criteria to Determine Appropriate Sample Size

1. Level of Precision
 Also called sampling error, the level of precision, is the range in which the true value of
the population is estimated to be.
2. Confidence Interval
 It is statistical measure of the number of times out of 100 that results can be expected to
be within a specified range. For example, a confidence interval of 90% means that results
of an action will probably meet expectations 90% of the time.

11
Criteria to Determine Appropriate Sample Size

 To find the right z – score to use, refer to the table:

3. Degree of Variability
 Depending upon the target population and attributes under consideration, the degree of
variability varies considerably. The more heterogeneous a population is, the larger the
sample size is required to get an optimum level of precision.
12
Methods in Determining the Sample Size

Estimating the Mean or Average


 The sample size required to estimate the population mean µ to with a
level of confidence with specified margin of error e, given by

where:
▻ Z is the z-score corresponding to level of confidence
▻ e is the level of precision.

13
Example : Estimating the Mean or Average

A soft drink machine is regulated so that the amount of drink dispensed is approximately normally distributed
with a standard deviation equal to 0.5 ounce. Determine the sample size needed if we wish to be 95% confident
that our sample mean will be within 0.03 ounce from the true mean.
 Solution: The z – score for confidence level 95% in the z – table is 1.96.

 We need a 1068 sample for our study.


Take note!
 If decimal, always round up to the next highest integer.
 If σ is unknown, it is common practice to conduct a preliminary survey to determine s and use it as an
estimate of σ or use results from previous studies to obtain an estimate of σ. When using this approach, the
size of the sample should be at least 30. Given by

14
Estimating Proportion (Infinite Population)

▰ For populations that are large Cochran developed the formula for calculating
sample size when the population is infinite:

where:
▻ is the sample size,
▻ is the selected critical value of desired confidence level
▻ p is the estimated proportion of an attribute that is present in the
population
▻ and is the desired level of precision
15
Example:

Suppose we are doing a study on the inhabitants of a large town, and want to find out how many
households serve breakfast in the mornings. We want 99% confidence and at least 1% precision.
Solution:
 We don’t have much information on the subject to begin with, so we’re going to assume that
half of the families serve breakfast: this gives us maximum variability (always assume
maximum variability).
 The z – score for confidence level 99% in the z – table is 2.58.

If the problem don’t have a confidence level and level of precision then we always assume
 Confidence level is 95%.
 The level of precision is 0.05.
16
Finite Population Correction

▰ If the population is small then the sample size can be reduced slightly.
▰ Cochran’s formula for calculating sample size when population size is finite:

where:
 is Cochran’s sample size recommendation.
 is the population size
 is the reduced sample size

17
Example :

▰ Using the problem above, supposed we want to study 20, 000 of the inhabitants.
Solution:

▰ As you can see, this adjustment (called the finite population correction) can
substantially reduce the necessary sample size for small populations.

18
Simplified Formula For Proportions

▰ Slovin’s formula or Yamane’s formula is used to calculate the sample size n given
the population size and error.
▰ According to Yamane, for a 95% confidence level and p . = 0 5 , size of the sample
should be computed as

where:
 is the total population.
 is the level of precision.
 is the sample size
19
Example 1

A researcher plans to conduct a survey about food preference of BS Stat students. If the
population of students is 1000, find the sample size if the error is 5%.

Solution:

 The researcher need to survey 286 BS stat students.

20
Example 2:

From the above example, when N = 20,000 if we assume or the problem


want a 95% confidence level or don’t give any confidence level then we
can use this Slovin’s formula

21
Things to rememer!

▰ We use estimating the mean formula if the problem gives the value of the standard deviation ()
if not mean should be given and solve for s (sample standard deviation).
▰ Cochran’s formula is used for infinite or “large” population meaning no specific value is given
for population, if a target population is given then reduced the sample size (finite population
correction). And if the confidence level is 95% then we can directly used Slovin’s formula.
These two formula coincides at 95% confidence level.
▰ Use Slovin’s formula if the problem directly give the value of N and e. It simply means that we
don’t know anything about the population.

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy