Unit-3 Chapter-5 Sampling and Sampling Distributions
Unit-3 Chapter-5 Sampling and Sampling Distributions
SAMPLING DISTRIBUTIONS
ESTIMATION…
India’s population = 132 Cr.
TV Viewership = 66 Cr.
No. of TV Sets = 16 Cr. (hypothetical)
Ludhiana
Birinder Singh, Assistant Professor, PCTE
2
IN THIS CHAPTER, WE EXAMINE
QUESTIONS SUCH AS
Ludhiana
Birinder Singh, Assistant Professor, PCTE
How do we know when our sample accurately
reflects the entire population?
3
WHY SAMPLING?
The testing process is destructive (Time Constraint)
The population is too large to be completely tested
Ludhiana
Birinder Singh, Assistant Professor, PCTE
It is almost impossible to define the population
4
DEFINITIONS
Population: All items that have been chosen for study. It is
also called Census.
Sample: A portion chosen from the population.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
5
CONVENTIONS TO BE USED
Characteristics Population Sample Statistics
Parameter
Size N n
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Mean µ 𝑥ҧ
Std. Deviation σ s
Proportion p or π 𝑝ҧ or p
6
SAMPLING METHODS
Non Probability SM Probability SM
Ludhiana
Birinder Singh, Assistant Professor, PCTE
probability that an probability that an
element of population will element of population will
be drawn is not known be drawn is known.
Classifications : It is also called random
Convenience Sampling sampling
Judgemental Sampling Methods:
Voluntary Response Simple Random Sampling
Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
7
SIMPLE RANDOM SAMPLING
Simple Random Sampling selects samples by methods that
allow each possible sample to have an equal probability of
being picked and each item in the entire population to have an
Ludhiana
Birinder Singh, Assistant Professor, PCTE
equal chance of being included in the sample.
Ex: Selecting a pair of 2 students from four students A,B,C,D
8
SIMPLE RANDOM SAMPLING
Merits Demerits
Ludhiana
Birinder Singh, Assistant Professor, PCTE
more representative of the complete list of population
population. units to be sampled.
This theory is more If area of coverage is large,
reliable & highly random samples are also
developed widely scattered
It saves time & effort geographically.
9
SYSTEMATIC SAMPLING
In systematic sampling, elements are selected from the
population at a uniform interval that is measured in time,
order or space.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Ex: If we wanted to interview every 20th student on a
college campus, we would chose a random starting point in
the first 20 names in the student directory and then pick
every 20th name thereafter.
10
STRATIFIED RANDOM SAMPLING VS
Ludhiana
Birinder Singh, Assistant Professor, PCTE
wide variation between the groups.
11
SAMPLING DISTRIBUTIONS
Sampling Distribution of the Mean: It is a
probability distribution of all the possible means of
the samples is a distribution of the sample means.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Ex: Suppose our samples each consist of ten 25 year
old women from a city with a population of 1,00,000.
By computing the mean height and SD of each of
these samples, we would quickly see that mean and
SD of each sample would be different.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Statistics Distribution
13
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take a sample of size 1,500 from the US. Record the mean
income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Take another sample of size 1,500 from the US. Record the
mean income. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Let’s create a sampling distribution of means…
Let’s repeat sampling of sizes 1,500 from the US. Record the
mean incomes. Our census said the mean is $30K.
$30K
A SAMPLING DISTRIBUTION
Say that the standard deviation of this distribution is $10K.
Think back to the empirical rule. What are the odds you
would get a sample mean that is more than $20K off.
$30K
-3z -2z -1z 0z 1z 2z 3z
A SAMPLING DISTRIBUTION
Say that the standard deviation of this distribution is $10K.
Think back to the empirical rule. What are the odds you
would get a sample mean that is more than $20K off.
2.5% 2.5%
$30K
-3z -2z -1z 0z 1z 2z 3z
STANDARD ERROR (S.E.)
The standard deviation of the distribution of a
sample statistic is known as the standard error of
the statistic.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
SE indicates how spread out (dispersed) the means
of the sample are.
SE indicates not only the size of the chance error
that has been made, but also the accuracy we are
likely to get if we use a sample statistic to estimate
a population parameter.
A distribution of sample means that is less spread
out (having small SE) is a better estimator of the
26
population mean
RELATIONSHIPS BETWEEN POPULATION PARAMETERS AND
THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
The expected value of the sample mean is equal to the population mean:
E( X )
X X
The variance of the sample mean is equal to the population variance divided by
the sample size:
2
V(X) 2
X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:
X
s.e. SD( X ) X
n
CENTRAL LIMIT THEOREM
As sample size increases, the sampling distribution of means
approaches normal distribution, irrespective of the nature of
population distribution.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
As a thumb rule, for n≥30, SDM is taken to be normally
distributed.
This is called Central Limit Theorem.
The significance of CLT is that it permits us to use sample
statistics to make inferences about population parameters
without knowing anything about the shape of the frequency
distribution of that population.
Sample means from population which are normally distributed
are also normally distributed regardless of size if sample.
28
CONVENTIONS TO BE USED
Characteristics Population Sample Statistics
Parameter
Size N n
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Mean µ 𝑥ҧ
Std. Deviation σ s
Proportion p or π 𝑝ҧ or p
29
WORKING METHODOLOGY
Make sure population is infinite i.e. N is not
given
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Check whether n≥30; if yes, SDM is considered to
be normally distributed
Find Z score using formula:
𝑥 − 𝜇𝑥ҧ
Z= where
𝜎 𝑥ҧ
𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛;
𝜇 𝑥ҧ = Mean of Means; 𝜇 𝑥ҧ = 𝜇
𝜎
𝜎𝑥ҧ = 𝑛
30
PRACTICE PROBLEMS – SDM / CLT
A bank calculates that its individual savings accounts are having a
mean of $2000 and SD of $600. If the bank takes a random sample of
100 accounts, what is the probability that the sample mean will lie
Ludhiana
Birinder Singh, Assistant Professor, PCTE
between $1900 and $2050? (0.75)
31
PRACTICE PROBLEMS – SDM / CLT
A continuous manufacturing process produces items whose
weights are normally distributed with a mean of 8 kg and SD of 3
kg. A random sample of 16 items is to be drawn. What is the
Ludhiana
Birinder Singh, Assistant Professor, PCTE
probability that sample mean exceeds 9 kgs. (9.18%)
32
THE FINITE POPULATION MULTIPLIER
Most of the populations decision are examined on finite
population i.e. it has limited size.
Standard Error of the mean for Finite Population is given by:
Ludhiana
Birinder Singh, Assistant Professor, PCTE
𝜎 𝑁 −𝑛
𝜎𝑥ҧ = 𝑥 𝑤ℎ𝑒𝑟𝑒
𝑛 𝑁 −1
𝑁 −𝑛
is called Finite Population Multiplier
𝑁 −1
N = Size of population
n = sample size
33
PRACTICE PROBLEMS
From a population of 125 items with a mean of 105 and SD of
17, 64 items were chosen.
Find Standard Error. (1.4904)
Ludhiana
Birinder Singh, Assistant Professor, PCTE
What is the P(107.5 < 𝑥ҧ < 109)? (0.0428)
34
PRACTICE PROBLEMS
From a population of 75 items with a mean of 364 and
Variance of 18, 32 items were chosen.
Find Standard Error.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
What is the P(363 < 𝑥ҧ < 366)?
35
Birinder Singh, Assistant Professor, PCTE
Ludhiana
36
ESTIMATION
When you are ready to cross a street, you
estimate the speed of the car that is approaching
towards you, the distance between you and the
Ludhiana
Birinder Singh, Assistant Professor, PCTE
car and your own speed.
Based on these quick estimates, you decide
whether to wait, walk or run…..
37
REASONS FOR ESTIMATES
Unit head estimates of next year admissions
Credit Manager estimates whether a purchase will eventually
pay his bills
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Homemakers estimate about the increase in commodity prices
38
TYPES OF ESTIMATES
Point Estimate Interval Estimate
Ludhiana
Birinder Singh, Assistant Professor, PCTE
used to estimate an to estimate an unknown
unknown population population parameter.
parameter.
Ex: Department head makes Ex: Department head makes
an estimate that our current an estimate that our current
data indicates that MBA data indicates that MBA
course will have 300 course will have 280-320
students in the next year. students in the next year.
It indicates the errors in two It indicates the errors in two
ways: ways:
Often insufficient as it is Extent of range
either right or wrong.
Probability of true population
Evaluation of precision of
estimator is not possible. parameter lying within that
range. 39
ESTIMATOR & ESTIMATES
An estimator is a sample statistic used to estimate a population
parameter.
Sample Mean 𝑥ҧ can be a estimator of the Population Mean µ.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Sample Proportion 𝑝ҧ can be a estimator of the Population Proportion p.
Employees in a Mean turnover per Mean turnover for a 8.9% turnover per
furniture factory year period of 1 month year
Ludhiana
Birinder Singh, Assistant Professor, PCTE
It should be efficient: Efficiency refers to the size of the standard
error of the statistic. The distribution with small standard error or
deviation is preferred.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
are the attendances (in thousands) at nine randomly selecting
sporting events. Find point estimates of the mean and the
variance of the population from which sample was drawn. 8.8,
14.0, 21.3, 7.9, 12.5, 20.6, 16.3, 14.1, 13.0
(14.28, 21.12)
42
INTERVAL ESTIMATE
Interval Estimate: Range of values within which a
population parameter is likely to be.
Confidence Level: Probability that is associated with an
Ludhiana
Birinder Singh, Assistant Professor, PCTE
interval estimate.
Confidence Interval: Range of estimate for a given
confidence level.
ഥ − 𝒛 𝝈ഥ𝒙 ≤ µ ≤ 𝒙
𝒙 ഥ + 𝒛 𝝈𝒙ഥ
Sample Mean
(Point Estimate Standard
of Mean) Population
Error Mean
Confidence
Coefficient
43
COMMONLY USED CONFIDENCE LEVEL &
CONFIDENCE COEFFICIENTS
Ludhiana
Birinder Singh, Assistant Professor, PCTE
95 1.96
98 2.33
99 2.58
68.26 1
95.4 2
99.9 3
44
INTERVAL ESTIMATES OF MEAN FROM
LARGE SAMPLES
There are two cases:
Case 1: When Population SD is known
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Case 2: When Population SD is not known
45
COMPUTATIONAL PROCEDURE
Choose level of confidence
Find ‘Z’ for chosen level
Compute Standard Error
Ludhiana
Birinder Singh, Assistant Professor, PCTE
If σ is known
𝜎
For infinite population: 𝜎𝑥ҧ = 𝑛
𝜎 𝑁 −𝑛
For finite population: 𝜎𝑥ҧ = 𝑥
𝑛 𝑁 −1
If σ is not known
2
𝑠 Σ 𝑥−𝑥ҧ
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝐸 = 𝑠𝑥ҧ = 𝜎ො𝑥ҧ = 𝑛
where Sample SD = s = 𝜎ො = 𝑛−1
s = Sample SD is used to estimate of the population SD
Ludhiana
Birinder Singh, Assistant Professor, PCTE
months. (34.61 ≤ µ≤ 37.39)
47
PRACTICE PROBLEMS – ESTIMATION
50 randomly selected pieces of plastic rope had a mean
breaking strength of 25 psi & SD of 1.4 psi. Find mean
breaking strength at 99% confidence level. (psi = pounce per
Ludhiana
Birinder Singh, Assistant Professor, PCTE
square inch) (24.49 ≤ µ≤ 25.51)
48
PRACTICE PROBLEMS – ESTIMATION
A large automotive parts wholesaler needs an estimate of the
mean life it can expect from windshield wiper blades under
typical driving conditions, Already, management has
Ludhiana
Birinder Singh, Assistant Professor, PCTE
determined that the SD of the population life is 6 months. A
random sample of 100 wiper blades has been selected with
mean life of 21 months. Find an interval estimate of mean life
with confidence level of 95%. (19.82 ≤ µ≤ 22.18)
49
PRACTICE PROBLEMS – ESTIMATION
From a population of 540, a sample of 60 individuals is taken.
From this sample, the mean is found to be 6.2 and the SD is
1.368.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Find the estimated standard error of the mean (0.167)
Construct a 96 percent confidence interval of the mean.
(5.86 ≤ µ≤ 6.54)
50
INTERVAL ESTIMATES OF MEAN FROM
SMALL SAMPLES (T DISTRIBUTION)
In certain cases, where normal distribution is not the
appropriate sampling distribution i.e. when we are estimating
the population SD and the sample size is small i.e. less than 30
Ludhiana
Birinder Singh, Assistant Professor, PCTE
In such cases, other distribution is appropriate called t –
distribution
Also called Student’s distribution
The second condition is that population standard deviation
must be unknown.
51
T - DISTRIBUTION
The shape of the t distribution is very similar to the shape
of the standard normal distribution.
The t distribution has a (slightly) different shape for each
Ludhiana
Birinder Singh, Assistant Professor, PCTE
possible sample size.
They are all symmetric and unimodal.
They are somewhat broader than Z, reflecting the
additional uncertainty resulting from using s in place of .
As n gets larger and larger, the shape of the t distribution
approaches the standard normal.
Contains more area under tails.
We need to know degree of freedom in t distribution. If
sample size is n, then df = n – 1.
52
CONDITIONS FOR T DISTRIBUTION
n<30
Populations SD (σ) is not known.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Populations assumed to be normal or nearly normal
Note:
Since σ is not known, 𝜎ො𝑥ҧ is used in lieu of 𝜎𝑥ҧ
Interval Estimation of Population Mean is
ഥ−µ
𝒙
ഥ −𝒕 ෝ𝝈𝒙ഥ ≤ µ ≤ 𝒙
𝒙 ෝ 𝒙ഥ where t =
ഥ +𝒕 𝝈
𝜎ෝ𝑥ഥ
In t-distribution table, it shows area and t-values for
only few %ages (10,5,2,1)
53
COMPUTATIONAL PROCEDURE
Choose Confidence Level
Find total chance of error i.e. α = 1 – CL
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Find degree of freedom i.e. df = n – 1
54
PRACTICE PROBLEMS – T DISTRIBUTION
Determine the 95% Confidence Interval for mean burning time
of marine flares if 9 flares were tested and yielded a mean
burning time of 40 minutes with a SD of 10 minutes.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
(32.32 ≤ µ≤ 47.68)
55
PRACTICE PROBLEMS – T DISTRIBUTION
Seven homemakers were randomly sampled and it was
determined that the distances they walked in their housework
had an average of 39.2 miles per week and a SD of 3.2 miles
Ludhiana
Birinder Singh, Assistant Professor, PCTE
per week. Construct a 95% confidence interval for the
population mean (36.24 ≤ µ≤ 42.16)
56
Start
DECISION FLOW DIAGRAM -
ESTIMATION
Is
Use ‘Z’ table Stop
n≥30
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Is pop. Known
Known to Is SD Use ‘Z’
known Stop
be table
normally ?
distributed
Not
Known
Use ‘t’
table
Use a 57
Statistician Stop
SAMPLING DISTRIBUTION OF
PROPORTIONS (SDP)
Means Proportions
Population Mean
p
Ludhiana
Birinder Singh, Assistant Professor, PCTE
µ
Sample Mean
𝑥ҧ 𝑝ҧ
Mean of SDM
µ𝑥ҧ = µ µ𝑝ҧ = p
SD of SDM
σ𝑥ҧ σ𝑝ҧ
Estimation of SDM
𝜎ො𝑥ҧ 𝜎ො𝑝ҧ
58
SDP – FORMULAE
Standard Error
𝑝𝑞
𝝈𝒑ഥ = (From population proportion)
Ludhiana
Birinder Singh, Assistant Professor, PCTE
𝑛
𝑝ҧ 𝑞ത
𝜎ො𝑝ҧ = 𝑛
(Estimated from sample proportion)
59
PRACTICE PROBLEMS – SDP
A TV company wishes to find out the proportion of families in a
city who owns a TV. A sample survey of 400 families revealed
that 320 of them owned a TV. Can we estimate with 95%
Ludhiana
Birinder Singh, Assistant Professor, PCTE
confidence the percentage of families in entire city who own a
TV. (76.08% ≤ p≤ 83.92%)
60
PRACTICE PROBLEMS – SDP
Delhi police intends to introduce a new uniform for officers
cadre. A survey estimates the proportion of officers who would
prefer change. Results showed that 45 out of 75 favored
Ludhiana
Birinder Singh, Assistant Professor, PCTE
change. Estimate the population proportion in favor of proposal
with 90% confidence level. (50.65% ≤ p≤ 69.35%)
61
PRACTICE PROBLEMS – SDP
Dr. Benjamin, a noted social psychologist, surveyed 150 top
executives and found that 42% of them were unable to add
fractions correctly.
Ludhiana
Birinder Singh, Assistant Professor, PCTE
Estimate the standard error of the population. (0.0403)
Construct a 99% confidence interval for the true proportion of top
executives who cannot correctly add fractions. (0.316 ≤ p≤ 0.524)
62