6.1 Central Limit Theorem
6.1 Central Limit Theorem
MODULE 6
6.1 CENTRAL LIMIT THEOREM
6.1.1 Introduction
The central limit theorem (CLT for short) is one of the most powerful and useful ideas
in all of statistics. There are two alternative forms of the theorem, and both
alternatives are concerned with drawing finite samples size n from a population with
a known mean, μ, and a known standard deviation, σ. The first alternative says that
if we collect samples of size n with a "large enough n," calculate each sample's mean,
and create a histogram of those means, then the resulting histogram will tend to have
an approximate normal bell shape. The second alternative says that if we again
collect samples of size n that are "large enough," calculate the sum of each sample
and create a histogram, then the resulting histogram will again tend to have a normal
bell-shape.
In either case, it does not matter what the distribution of the original population is,
or whether you even need to know it. The important fact is that the distribution of
sample means and the sums tend to follow the normal distribution.
The size of the sample, n, that is required in order to be "large enough" depends on
the original population from which the samples are drawn (the sample size should be
at least 30 or the data should come from a normal distribution). If the original
population is far from normal, then more observations are needed for the sample
means or sums to be normal. Sampling is done with replacement.
If you draw random samples of size 𝑛𝑛, then as 𝑛𝑛 increases, the random variable
𝑋𝑋� which consists of sample means, tends to be normally distributed and
𝜎𝜎𝑋𝑋
𝑋𝑋�~𝑁𝑁 �𝜇𝜇𝑋𝑋 , �
√𝑛𝑛
The central limit theorem for sample means says that if you keep drawing larger and
larger samples (such as rolling one, two, five, and finally, ten dice) and calculating
their means, the sample means form their own normal distribution (the sampling
distribution).
These notes include material from Introductory Statistics by B. Illowsky, S.Dean ...1
Access for free at https://openstax.org/details/books/introductory-statistics
Licenced by OpenStax under Creative Commons Attribution License v4.0
MATH10064
The normal distribution has the same mean as the original distribution and a
standard deviation that equals the original standard deviation divided by, the square
root of the sample size 𝑛𝑛.
The variable 𝑛𝑛 is the number of values that are averaged together, not the number of
times the experiment is done.
To put it more formally, if you draw random samples of size 𝑛𝑛, the distribution of the
random variable 𝑋𝑋�, which consists of sample means, is called the sampling
distribution of the mean.
The sampling distribution of the mean approaches a normal distribution as 𝑛𝑛, the
sample size, increases.
𝜎𝜎𝑋𝑋
= standard deviation of 𝑋𝑋� is called the standard error of the mean.
√𝑛𝑛
Ex. 1
An unknown distribution has a mean of 90 and a standard deviation of 15. Samples
of size n = 25 are drawn randomly from the population.
a) Find the probability that the sample mean is between 85 and 92.
b) Find the value that is two standard deviations above the expected value, 90, of
the sample mean.
>>
a) P = 0.6997 b) Value is 96
Ex. 2
An unknown distribution has a mean of 45 and a standard deviation of eight.
Samples of size n = 30 are drawn randomly from the population. Find the probability
that the sample mean is between 42 and 50.
Ex. 3
The length of time, in hours, it takes an "over 40" group of people to play one soccer
match is normally distributed with a mean of two hours and a standard deviation of
0.5 hours. A sample of size 𝑛𝑛 = 50 is drawn randomly from the population. Find the
probability that the sample mean is between 1.8 hours and 2.3 hours.
>>
P = 0.9977
Ex. 4
The length of time taken on the SAT for a group of students is normally distributed
with a mean of 2.5 hours and a standard deviation of 0.25 hours. A sample size of 𝑛𝑛
= 60 is drawn randomly from the population. Find the probability that the sample
mean is between two hours and three hours.
Ex. 5
In a recent study reported Oct. 29, 2012 on the Flurry Blog, the mean age of tablet
users is 34 years. Suppose the standard deviation is 15 years. Take a sample of size
n = 100.
a) What are the mean and standard deviation for the sample mean ages of tablet
users?
b) What does the distribution look like?
c) Find the probability that the sample mean age is more than 30 years (the
reported mean age of tablet users in this particular study).
d) Find the 95th percentile for the sample mean age (to one decimal place).
>>
a) 1.5
b) The central limit theorem states that for large sample sizes(n), the sampling
distribution will be approximately normal.
c) 0.9962
d) 36.5
Ex. 6
The mean number of minutes for app engagement by a tablet user is 8.2 minutes.
Suppose the standard deviation is one minute. Take a sample of 60.
a) What are the mean and standard deviation for the sample mean number of app
engagement by a tablet user?
b) What is the standard error of the mean?
c) Find the 90th percentile for the sample mean time for app engagement for a
tablet user. Interpret this value in a complete sentence.
d) Find the probability that the sample mean is between eight minutes and 8.5
minutes.
>>
a) 0.13
b) 60
c) 8.37
d) 0.9293
Ex. 7
In an article on Flurry Blog, a gaming marketing gap for men between the ages of 30
and 40 is identified. You are researching a startup game targeted at the 35-year-old
demographic. Your idea is to develop a strategy game that can be played by men from
their late 20s through their late 30s. Based on the article’s data, industry research
shows that the average strategy player is 28 years old with a standard deviation of
4.8 years. You take a sample of 100 randomly selected gamers. If your target market
is 29- to 35-year-olds, should you continue with your development strategy?
Ex. 8
Cans of a cola beverage claim to contain 16 ounces. The amounts in a sample are
measured and the statistics are 𝑛𝑛 = 34, 𝑥𝑥̅ = 16.01 ounces. If the cans are filled so that
𝜇𝜇 = 16.00 ounces (as labeled) and 𝜎𝜎 = 0.143 ounces, find the probability that a
sample of 34 cans will have an average amount greater than 16.01 ounces. Do the
results suggest that cans are filled with an amount greater than 16 ounces?
Ex. 9
Based on data from the National Health Survey, women between the ages of 18 and
24 have an average systolic blood pressures (in mm Hg) of 114.8 with a standard
deviation of 13.1. Systolic blood pressure for women between the ages of 18 to 24
follow a normal distribution.
a) If one woman from this population is randomly selected, find the probability
that her systolic blood pressure is greater than 120.
b) If 40 women from this population are randomly selected, find the probability
that their mean systolic blood pressure is greater than 120.
c) If the sample were four women between the ages of 18 to 24 and we did not
know the original distribution, could the central limit theorem be used?
Ex. 10
A study was done about violence against prostitutes and the symptoms of the
posttraumatic stress that they developed. The age range of the prostitutes was 14 to
61. The mean age was 30.9 years with a standard deviation of nine years.
a) In a sample of 25 prostitutes, what is the probability that the mean age of the
prostitutes is less than 35?
b) Is it likely that the mean age of the sample group could be more than 50 years?
Interpret the results.
c) Find the 95th percentile for the sample mean age of 65 prostitutes. Interpret
the results.
>>
a) 0.9886
b) 𝑃𝑃 ≈ 0 => For this sample group, it is almost impossible for the group’s average
age to be more than 50. However, it is still possible for an individual in this
group to have an age greater than 50.
c) The 95th percentile = 32.7. This indicates that 95% of the prostitutes in the
sample of 65 are younger than 32.7 years, on average.