0% found this document useful (0 votes)
40 views24 pages

DMDW 4

This document discusses various sampling methods used in data collection and analysis. It begins by defining sampling as selecting a subset of a population to make conclusions about the whole population. It then covers probability sampling methods like simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. It also discusses non-probability sampling methods such as quota sampling, purposeful sampling, convenience sampling, and snowball sampling. The document provides examples and explanations of each sampling technique.

Uploaded by

Anu agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views24 pages

DMDW 4

This document discusses various sampling methods used in data collection and analysis. It begins by defining sampling as selecting a subset of a population to make conclusions about the whole population. It then covers probability sampling methods like simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. It also discusses non-probability sampling methods such as quota sampling, purposeful sampling, convenience sampling, and snowball sampling. The document provides examples and explanations of each sampling technique.

Uploaded by

Anu agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Sampling

KALINGA INSTITUTE OF INDUSTRIAL


TECHNOLOGY

School Of Computer
Engineering

Datamining and Dr. Amiya Ranjan Panda


Assistant Professor [II]
Data warehousing School of Computer Engineering,
Kalinga Institute of Industrial Technology (KIIT),
(CS 2004) Deemed to be University,Odisha

3 Credit Lecture Note 05


Acknoledgement
2

A Special
Thanks to
J. Han and M. Kamber.
&
Tan, Steinbach, Kumar
for their slides and books, which I have
used for preparation of these slides.
Chapter Contents
3
q Sampling
q Sampling Method
Ø Probability Sampling
ü Simple Random Sampling
ü Systematic Sampling
ü Stratified Sampling
ü Cluster sampling
ü Multi Stage Sampling
Ø Non Probability Sampling
ü Quota Sampling
ü Purpose Sampling/Judgemental Sampling
ü Convenience Sampling
ü Referral / Snowball Sampling

q Sampling Error
q Five common types of Sampling Error
Sampling
4

q Sampling is a technique of selecting individual members or a subset of the


population to make a statistical conclusion in the basis of evidence from them
and estimate characteristics of the whole population.
q A sample is a “subgroup of a population”.
q As a way of obtaining a group of people or objects to study that were
representative of a large population or universe of interest. (Stacks & Hocking,
1999)
Sampling Methods
5 Probability Sampling Non-Probability Sampling
JA
K
&
KT
• Any element can be chosen randomly from • Every element will be chosen on the
H
the population. It deals with choosing the subjective judgment (purposefully

ILN BYG
sample randomly. /intentionally) from the population on the
• The most critical requirement of probability basis of certain past experience &

PA
S
sampling is that everyone in your population
M
has a known and equal chance of getting •
knowledge rather than random selection.
A sampling process where every single
selected. individual elements in the population
• Ex. When an unbiased coin is thrown may not have an opportunity to be
(randomly), the probability of getting choosen as a sample.
the head is ½. • For example, one person could have a
• Ex. Probability of getting a number i.e 10% chance of being selected and
6 when a dice will be thrown. another person could have a 50%
chance of being selected.
Probability Sampling
6

q Simple Random Sampling

q Systematic Sampling

q Stratified Sampling

q Cluster sampling

q Multi Stage Sampling


Simple Random Sampling
7

q Randomly any element can be chosen


q Chance of selection is totally in a
randomized fashion.
q No previous knowledge, criteria and
procedure is followed at the time of selection
of the sample from the population.

Example: Suppose we would like to select 10 students from any class consists of 75
students. Write the roll numbers of each students in separate cheats and put it in a
container and 10 cheats from the container one by one randomly. Here probability
of selection is 1/75

Advantage: Every element has an equal chance of getting selected to be the part
sample.
Systematic Sampling
8
q Each member of the sample comes
after an equal interval from its
previous member.
q All the elements are put together in
a sequence first where each
element has the equal chance of
being selected.
q Select a random starting point and
then select the individual at regular
intervals

Example: Suppose we would like to select 10 students from any class consists of
75 students. Choosing a random stating roll choose every 5th student.

Advantage: As each student has a chance of getting selected there is no biasness in


selection.
Systematic Clustering (cont..)
9
q For a sample of size n, we divide our population of size N into subgroups of k
elements.
q We select our first element randomly from the first subgroup of k elements.
q To select other elements of sample, perform following:
Ø We know number of elements in each group is k i.e N/n
Ø So if our first element is n1 then Second element is n1+k i.e n2
Ø Third element n2+k i.e n3 and so on.. Taking an example of N=20, n=5
Ø No of elements in each of the subgroups is N/n i.e 20/5 =4= k
Ø Now, randomly select first element from the first subgroup. If we select
n1= 3, n2 = n1+k = 3+4 = 7, n3 = n2+k = 7+4 = 11
Stratified Sampling
10
q T h e po pu l a t i on i s d i v i d e d i n t o
smaller homogeneous groups or strata
by some characteristics.
q i.e the elements within the group are
homogeneous and heterogeneous
among the other subgroups formed.
q The samples are selected randomly
from these strata.
q We need to have prior information
about the population to create
subgroups

Example: Suppose we would like to select some students from any class
consists of 75 students. The students will be divided into groups of boys and
girls. Then some students will be chosen from boys and some from the girls.

Advantage: Members of each category or group will be chosen without any bias.
Cluster Sampling
11
q From the big population, choose a small
group by diving it into clusters/sections i.e
area wise.
q The clusters are randomly selected.
q All the elements of the cluster are used for
sampling.

Example: Suppose we would like to know the awareness about COVID in a city.
Instead of going the details survey of the entire city one can divide the city into
clusters and randomly choose a cluster from that. All the members of the cluster
will be considered.

Cluster sampling can be done in following ways:


Ø Single Stage Cluster Sampling
Ø Two Stage Cluster Sampling
Single and Two stage Cluster Sampling
12

q Dividing the entire population into clusters.


Out of many clusters one cluster is
selected randomly for sampling.

q Dividing the entire population into clusters.


Randomly select two or more clusters and
then from those selected clusters again
randomly select elements for sampling.

Example: An airline company wants to survey its customers one day, so they
randomly select 55 flights that day and survey every passenger on those flights.
12
Multi Stage Sampling
13

q Population is divided into multiple clusters and then these clusters are further
divided and grouped into various sub groups (strata) based on similarity.
q One or more clusters can be randomly selected from each stratum.
q This process continues until the cluster can’t be divided anymore.
q Example : A country can be divided into states, cities, urban and rural and all the
areas with similar characteristics can be merged together to form a strata.

13
Non-Probability Sampling
14
q Every element will be chosen purposefully/intentionally from the population
on the basis of certain past experience and knowledge.
q It is a less stringent method.
q This sampling method depends heavily on the expertise of the researchers.
q It is carried out by observation, and researchers use it widely for qualitative
research.
q Mainly classified into
Ø Quota Sampling
Ø Purpose Sampling/Judgemental Sampling
Ø Convenience Sampling
Ø Referral / Snowball Sampling

14
Quota Sampling
15
q Quota sampling works by first dividing the selected population into exclusive
subgroups.
q The proportions of each subgroup are measured, and the ratio of selected
subgroups are then used in the final sampling process.
q The proportions of the selected subgroups are used as boundaries for selecting
a sample population of proportionally represented subgroups.
q There are two types of quota sampling:
ü proportional
ü non proportional

15
Proportional Quota Sampling
16
q In proportional quota sampling you want to represent the major characteristics
of the population by sampling a proportional amount of each.
q The problem here is that you have to decide the specific characteristics on
which you will base the quota. Will it be by gender, age, education race,
religion, etc.?
q For example, if you know the population has 40% women and 60% men, and
that you want a total sample size of 100, you will continue sampling until you
get those percentages and then you will stop. So, if you’ve already got the 40
women for your sample, but not the sixty men, you will continue to sample
men but even if legitimate women respondents come along, you will not
sample them because you have already “met your quota.”

16
Non-Proportional Quota Sampling
17
q Use when it is important to ensure that a number of sub-groups in the field of
study are well-covered.
q Use when you want to compare results across sub-groups.
q Use when there is likely to a wide variation in the studied characteristic within
minority groups.
Ø Identify sub-groups from which you want to ensure sufficient coverage.
Specify a minimum sample size from each sub-group.
Ø Here, you’re not concerned with having numbers that match the
proportions in the population. Instead, you simply want to have enough to
assure that you will be able to talk about even small groups in the
population.
q Example:A study of the prosperity of ethnic groups across a city, specifies that
a minimum of 50 people in ten named groups must be included in the study.
The distribution of incomes across each ethnic group is then compared against
one another.

17
Purpose Sampling/Judgemental Sampling
18
q Samples are chosen only on the
basis of the researcher’s knowledge
and judgement.
q It enables the researcher to select
cases that will best enable him to
answer his research questions that
meet the objective.
q Choosing a sample because of
represent the certain purpose.
q Example-1: In online live voting for selecting a GOOD Singer from a
competition, the people who have interest in singing can be selected in the
sample .
q Example-2: If we want to understand the thought process of the people who
are interested in pursuing master’s degree then the selection criteria would be
“Are you interested for Masters in..?”
q All the people who respond with a “No” will be excluded from our sample.
18
Convenience Sampling
19
q Convenience sampling (also called accidental sampling or grab sampling) is
where you include people who are easy to reach.
q Sample are taken mainly on basis of the readily available.
q Sample which is convenient to the researcher or the data analyst can be chosen.
The task is done without any principles or theories.
q For example, you could survey people from:
ü Your workplace,
ü Your school,
ü A club you belong to,
ü The local mall.

q Example: Suppose I would like to select 5 students from any class consists of
75 students. Choosing the 5 students who sits near by me without any
principle of selection.
19
Referral / Snowball Sampling
20
q Snowball sampling method is purely
based on referrals and that is how a
researcher is able to generate a
sample.
q So the researcher will take the help
from the first element which he
select for the population and ask
him to recommend others who will
fit for the description of the sample
needed.
q So this referral technique goes on,
increasing the size of population like
a snowball.
Example: If you are studying the level of customer satisfaction among the members
of an elite country club, you will find it extremely difficult to collect primary data
sources unless a member of the club agrees to have a direct conversation with you
and provides the contact details of the other members of the club.
20
Sampling Errors
21
q Sampling error is a statistical error that occurs when an analyst does not select a
sample that represents the entire population of data.
q The results found in the sample thus do not represent the results that would be
obtained from the entire population.
q Sampling error can be reduced by randomizing sample selection and/or
increasing the number of observations.
q It mainly happens when the sample size is very small (10 to 100).
For example, if you wanted to figure out how many Formula: the formula for the margin of
people out of a thousand were under 18, and you came error is 1/√n, where n is the size of the
up with the figure 19.357%. If the actual percentage sample. For example, a random sample
equals 19.300%, the difference (19.357 – 19.300) of 0.57 of 1,000 has about a 1/√n; = 3.2% error.
or 3% = the margin of error. If you continued to take
samples of 1,000 people, you’d probably get slightly
different statistics, 19.1%, 18.9%, 19.5% etc, but they
would all be around the same figure. This is one of the
reasons that you’ll often see sample sizes of 1,000 or
1,500 in surveys: they produce a very acceptable margin
of error of about 3%.
21
Five Common Types of Sampling Errors
22
q Population Specification Error—This error occurs when the researcher does
not understand who they should survey.
q Sample Frame Error—A frame error occurs when the wrong sub-population
is used to select a sample.
q Selection Error—This occurs when respondents self-select their participation
in the study – only those that are interested respond. Selection error can
be controlled by going extra lengths to get participation.
q Non-Response—Non-response errors occur when respondents are different than
those who do not respond. This may occur because either the potential
respondent was not contacted or they refused to respond.
q Sampling Errors—These errors occur because of variation in the number or
representativeness of the sample that responds. Sampling errors can be
controlled by (1) careful sample designs, (2) large samples, and (3) multiple
contacts to assure representative response.

22
Recommended Text and Reference Books
23
q Text Book:
Ø J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan
Kaufmann, 3rd ed., 2011
q Reference Books:
Ø H. Dunham. Data Mining: Introductory and Advanced Topics. Pearson
Education. 2006.
Ø I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools
and Techniques. Morgan Kaufmann. 2000.
Ø D. Hand, H. Mannila and P. Smyth. Principles of Data Mining.Prentice-Hall.
2001.

23
24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy