Chapter 6 Sampling Distribution PDF
Chapter 6 Sampling Distribution PDF
This chapter is still about the probability distributions of random variables. However, the random
variables are not characteristics of raw scores; rather, they are the statistics of samples of scores (eg.
sample mean, X or sample proportion, p ). Of interest to us is the pattern of variation (distribution) of a
sample statistic, known as the sampling distribution of a statistic.
When different samples (of the same size) are randomly drawn from a common population, the sample
statistics of interest (eg. sample mean, X or sample proportion, p ) will vary from sample to sample
because of random variation. Thus, a sample statistic is a random variable. The probability that a sample
statistic assumes a particular value depends on the likelihood of the sample being selected. The pattern of
variation in the possible values of the sample statistic forms a distribution which is known as a sampling
distribution.
Suppose that samples, each of size n, are drawn from a population, X. For each sample, the mean of the n
scores, X , is computed. If infinitely many samples are drawn, there will be infinitely many X ’s and a
plot of these X ’s will reveal a pattern in its distribution. This distribution is known as the sampling
distribution of the sample mean, X .
The above table lists all possible samples of size 2, the mean for each sample, the minimum for each
sample, and the probability of drawing each sample (all equally likely).
The sampling distribution of the sample mean and the sample minimum are constructed based on the
above table.
P( x)
0.25
0.20
0.15
0.10
0.05
0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
Histogram: Sampling Distribution of the Sample Minimum:
m P(m) 0.4
1 7/16 0.3
2 5/16 0.2
3 3/16
4 1/16 0.1
0.0
m
1 2 3 4
The theory involved with sampling distributions requires random sampling whereby a random sample is
defined as:
A sample obtained in such a way that each possible sample of a fixed size n has an equal probability
(chance) of being selected.
In studying about sampling distributions, we will investigate the variability in sample statistics from
sample to sample. For each sample statistic, we will investigate
- the pattern (shape) of variability
- its mean
- its standard deviation.
Knowledge about the sampling distribution of a sample statistic will enable us to calculate its
probabilities. In this course, we will focus on the sampling distributions of the sample mean, X and
sample proportion, p .
Suppose that k samples, each of size n, are drawn from a common population, X. For each sample, the
mean of the n scores, x , is computed. If k samples are drawn, there are k values of x , as summarized in
the following table.
The value xij can be thought of as the jth observation of sample i. If Xj denotes ‘the jth observation of a
sample’, then Xj is a random variable that varies from sample to sample. Thus, for samples of size n,
there are n random variables, X1, X2, … Xn . Furthermore, since all xij ’s are independent and are
observations from the same population, X, X1, X2, … Xn are therefore, independent with a common
distribution from X.
Let the sample mean of the ith sample be denoted by xi . Due to random variation, the values of
x1, x2 , ... , xk will also vary. Thus, there exists a random variable, namely the ‘sample mean’, denoted by
X , having possible values x1, x2 , ... , xk . By definition, the sample mean is the average of the n
observations in the sample, i.e.
1
X X 1 X 2 ... X n
n
1 1 1
X 1 X 2 ... X n
n n n
Suppose that the common population , X , has mean and variance 2 . The mean and variance of X
are derived as follows:
1 1 1
µ X = E ( X ) = E X1 + X 2 + ... + X n
n n n
1 1 1
= E ( X1) + E ( X 2 ) + ... + E ( X n )
n n n
1 1 1 1
= µ + µ + ... + µ = n µ = µ
n n n n
1 1 1
σ X2 = Var ( X ) = Var X1 + X 2 + ... + X n
n n n
1 1 1
=
2
Var ( X1) + 2 Var ( X 2 ) + ... + 2 Var ( X n ) (since the Xi ‘s are independent)
n n n
1 1 1 1 σ2
=
2
σ2 + 2
σ 2 + ... + 2
σ 2 = n 2
σ2 =
n n n n n
Example: The mean of a population is 64, and its standard deviation is 12. A sample of 40
observations is randomly selected. Find the expectation and variance of the sample mean.
The Central Limit Theorem is a very important and powerful theorem in statistics. Most of hypothesis
testing and sampling theory are based on this theorem. The Central Limit Theorem provides a justification
for using the normal curve as a model for many naturally occurring phenomena. In most situations, this
theorem works reasonably well with sample sizes greater than 25. The informal statement of the theorem
is as follows:
Suppose X1, X2, , . . X n are n independent random variables having a common distribution. Then, as
n increases, the distributions of X1 + X2 +. . . + X n and of (X1 + X2 +. . . + Xn)/n increasingly
resembles normal distributions.
Note: The proof of the Central Limit Theorem is beyond the scope of this course
10 20 30 x 10 20 30 x
Distribution of x:
n = 10 Distribution of x:
n = 30
10 x 10 20 x
Summary
- The mean of the sampling distribution of X is equal to the mean of the original population: µ X = µ
- The standard deviation of the sampling distribution of X (also called the standard error of the mean)
is equal to the standard deviation of the original population divided by the square root of the sample
σ
size: σ X =
n
- The distribution of X is (exactly) normal when the original population is normal.
The CLT says that the distribution of X is approximately normal , regardless of the shape of the
original distribution, when the sample size is large enough!
When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately
normally distributed (by the CLT), we can then compute probabilities about the a sample mean using the
standard normal distribution.
Standard Error of the Mean
The standard deviation of the sampling distribution of the mean is called the standard error of the mean
σ
and is symbolized by σ X = . The standard error of a statistics describes the degree to which the
n
computed statistics will differ from one another when calculated from samples of similar size and selected
from similar population models. The larger the standard error, the greater the difference between the
computed statistics.
Example: Consider a normal population with 50 and 2 = 15. Suppose a sample of size 9 is
selected at random. Find: (1) P (45 ≤ X ≤ 60) (2) P ( X ≤ 47.5)
Solution:
Example: A recent report stated that the day-care cost per week in a city is RM200. Suppose that this
figure is taken as the mean cost per week and that the standard deviation is known to be RM25.
(i) Find the probability that a sample of 50 day-care centers would show a mean cost of RM200 or less
per week.
(ii) Suppose the actual sample mean cost for the sample of 50 day-care centers is RM215. Is there any
evidence to refute the claim of RM200 presented in the report?
Solution:
The shape of the original distribution is unknown, but the mean and standard deviation are known. As the
sample size, n, is large, the CLT applies. Thus, it can be said that the distribution of X is approximately
normal.
6.4 Sampling Distribution of Sample Proportions, p̂
Consider a population of size N with members which can be categorized as successes or failures. If the
number of successes in the population is Xc, then the proportion (or percentage or probability) of success
Xc
in the population, denoted as p, is , a constant.
N
Suppose that a random sample of n observations is taken from the population. Let the number of
successes in the sample be denoted by X. The observed proportion of successes in the sample, denoted by
X
p̂ , is therefore, . If samples of n observations are repeatedly taken from the same population an
n
X
infinite number of times, there will be an infinite number of sample proportions, pˆ = . The value of p̂
n
will vary from sample to sample because the value of X varies from sample to sample. Thus p̂ is a
random variable and its probability distribution is known as the sampling distribution of sample
proportions.
From Ch 5.1, we know that X , the number of successes in a sample of size n, is a random variable
having a binomial distribution with parameters n and p, written as X ~ Bin (n, p). The mean and variance
of X is
E(X) = np and Var(X) = npq.
It follows that the mean and variance of the random variable p̂ are:
X
pˆ E ( pˆ ) E
n
1
np p
n
and
X
pˆ 2 Var ( pˆ ) Var
n
1 pq
2
Var ( X )
n n
For cases where n is sufficiently large (so that np 5 and nq 5 ), the normal approximation on the
binomial distribution may be used to find probabilities about p̂ .
Summary
- The mean of the sampling distribution of p̂ is equal to the proportion of the original population, i.e.
µ pˆ = p
- The standard deviation of the sampling distribution of p̂ (also called the standard error of the
pq
proportion), i.e. σ pˆ =
n
- The sampling distribution of p̂ is approximately normal when the sample size, n, is large enough
( np ≥ 5 and nq ≥ 5 ).
Example: Suppose a fair coin is tossed 64 times. Determine the probability that at most 30 heads are
obtained.
Solution: Let X be the number of heads obtained, and p the probability of obtaining a head in a toss.
X ~ Binomial (n = 64, p = 0.5)
pˆ ~ Normal ( µ = 0.5, σ 2 = (0.5)(0.5) / 64 = 0.0039)
X 30
P ( X ≤ 30) = P ≤ = P ( pˆ ≤ 0.4688)
n 64
0.4688 − 0.5
≈ P Z ≤
0.0039
= P ( Z ≤ −0.4996) = 1 − P ( Z ≤ 0.4996) = 1 – 0.6913 = 0.3087