STA 241 Topic 14 Laws of Large Numbers (Corr)
STA 241 Topic 14 Laws of Large Numbers (Corr)
http://ecampus.mmust.ac.ke
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Objectives
By the end of this topic, learner should be able to:
1. State and prove Chebyshev's inequality
2. Obtain the weak and strong laws of large numbers
3. Derive the central limit theorem.
4. Solve practical (real life) problems using the knowledge of the central limit theorem
Learning Activities
Students to take note of the activities and exercises provided within the text and at the end of the
topic.
Topic Resources
Students to take note of the reference text books provided in the course outline.
Learners to get e-learning materials from MMUST library and other links within their reach.
Page 2 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Then this is
u
2 f X t dt 2 f X t dt
u
Since, t
t
2 t
2
f X t dt
u
f X t dt
t dt P X or X P X
u
2
f X t dt fX 2 2
u
Thus, 2 2 P X
Dividing by 2
2
P X
2
2
P X 2
Hence the proof!
Page 3 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Definition 15.1: For i.i.d. random X 1 , X 2 ,..., X n , the sample mean, denoted by X , is defined as
X 1 .... X n
X
n
X .... X n E X 1 .... E X n .... n
EX E 1
n n n n
Also,
X .... X n Var X 1 .... Var X n .... n 2 2
2 2
Var X Var 1
n n2 n2 n2 n
Then for any 0 , the lim P X 0
Proof:
Var X Var X
P X 2
n 2
This goes to zero as n
Activity 15.1: Read and make notes on the strong laws of the large numbers (SLLN).
Page 4 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
X 1 .... X n 2
Then the sample mean X has mean E X and variance Var X
n n
X n X
Thus, the normalized random variable Z n
2
n
Z n has mean E Z n 0 and variance Var Z n 1
The central limit theorem states that the CDF of Z n converges to the standard normal CDF.
Theorem 15.2: Let X 1 , X 2 ,..., X n be i.i.d. random variables with finite expected value E X and
X n X
variance: 0 Var X 2 . Then, the random variable Z n converges in
2
n
distribution to the standard normal random variable as n goes to infinity, that is
lim n P Z n x x ; x , where x is the standard normal CDF.
Remark 15.1: An interesting thing about the CLT is that, it does not matter what the distribution of the Xi's
is. The Xi's can be discrete, continuous, or mixed random variables.
Theorem 15.3: Let X 1 , X 2 ,..., X n be i.i.d Bernoulli random variable with parameter p . Then,
E X p and Var X p 1 p pq . Also, Yn X 1 ... X n has a binomial random variable with
X Yn np
parameters n and p , i.e. Y ~ B n, p . This implies that, Z n , where Yn ~ Bin n, p .
np 1 p
2
Page 5 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
y n y1 n
2
n n
Example 15.1: A bank teller serves customers standing in the queue one by one. Suppose that the service
time Xi for customer i has mean E X i 2 minutes and Var X i 2 1 . We assume that, service
times for different bank customers are independent. Let Y be the total time the bank teller spends
serving 50 customers. Find P 90 Y 110
Solution:
Let Yn X 1 ... X n , n 50 , E X i 2 and Var X i 2 1 .
90 n Y n 110 n
P 90 Y 110 P
n n n
90 100 110 100
P
50
Z
50
P 2Z 2
2 2 0.8427
Example 15.2: In a communication system, each data packet consists of 1000 bits. Due to the noise, each bit
may be received in error with probability 0.1. It is assumed bit errors occur independently. Find the
probability that there are more than 120 errors in a certain data packet.
Solution:
Let us define Xi as the indicator variable for the ith bit in the packet. That is, the ith bit is, X i 1 if the ith bit
is received in error, and X i 0 otherwise. Then the Xi are i.i.d. and X i ~ Bernoulli p 0.1
If Y is the total number of bits in the packet, we have, Yn X 1 ... X n
Since X i ~ Bernoulli p 0.1 , we have n 1000 , E X i p 0.1 and
Var X i 2 p 1 p 0.09
Y n 120 n
Thus, P Y 120 P
n n
120 100 20
P Zn 1 0.0175
90 90
X 1 .... X n
Exercise 15.1: Let X 1 , X 2 ,..., X n be i.i.d. Exp random variable with 1 . Let X . How
n
large should n be such that P 0.9 X 1.1 0.95 .
Page 6 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Exercise 15.2: Data from the Framingham Heart Study found that subjects over age 50 had a mean HDL of
54 and a standard deviation of 17. Suppose a physician has 40 patients over age 50 and wants to determine
the probability that the mean HDL cholesterol for this sample of 40 men is 60 mg/dl or more (i.e., low risk).
Probability questions about a sample mean can be addressed with the Central Limit Theorem, as long as the
sample size is sufficiently large. In this case n=40, so the sample mean is likely to be approximately
normally distributed, so we can compute the probability of HDL>60 by using the standard normal
distribution table.
The population mean is 54, but the question is what is the probability that the sample mean will be >60?
In general,
P(Z > 2.22) can be looked up in the standard normal distribution table, and because we want the probability
that P(Z > 2.22), we compute is as P(Z > 2.22) = 1 - 0.9868 = 0.0132.
Therefore, the probability that the mean HDL in these 40 patients will exceed 60 is 1.32%.
i) What is the probability that the mean HDL cholesterol among these 40 patients is less than 50?
Solution
Because AFP is normally distributed, we standardize P(X > 75) = P(Z > (75-58) / 18 = (17 / 18) = 0.84.
From the standardized normal distribution table, P(X>75 = P(Z.0.94) = 1 - 0.8264 = 0.1736.
Page 7 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Therefore, there is a 17% probability that AFP exceeds 75 in a pregnant woman measured at 18 weeks
gestation.
Example:15.3 Suppose we want to estimate the mean LDL cholesterol in the population of adults 65 years
of age and older. We know from studies of adults under age 65 that the standard deviation is 13, and we will
assume that the variability in LDL in adults 65 years of age and older is the same. We will select a sample of
n=100 participants > 65 years of age, and we will use the mean of the sample as an estimate of the
population mean. We want our estimate to be precise; specifically we want it to be within 3 units of the true
mean LDL value. What is the probability that our estimate (i.e., the sample mean) will be within 3 units of
the true mean? We think of this question as P(μ - 3 < sample mean < μ + 3).
Because this is a probability about a sample mean, we will use the Central Limit Theorem. With a sample of
size n=100 we clearly satisfy the sample size criterion so we can use the Central Limit Theorem and the
standard normal distribution table. The previous questions focused on specific values of the sample mean
(e.g., 50 or 60) and we converted those to Z scores and used the standard normal distribution table to find
the probabilities. Here the values of interest are μ - 3 and μ + 3. The solution can be set up as follows:
From the standard normal distribution table P(Z < 2.31) = 0.98956, and a P(Z < -2.31) = 0.01044. The range
between these two = P(-2.31 < Z < 2.31) = 0.98956 - 0.01044 = 0.9791. Therefore, there is a 97.91%
probability that the sample mean, based on a sample of size n=100, will be within 3 units of the true
population mean. This is a very powerful statement, because it means that for this question looking only at
100 individuals aged 65 or older gives us a very precise estimate of the population mean.
Activity 15.2: Alpha fetoprotein (AFP) is a substance produced by a fetus that can be measured in pregnant
woman to assess the probability of problems with fetal development. When measured at 15-20 weeks
gestation, AFP is normally distributed with a mean of 58 and a standard deviation of 18. What is the
probability that AFP exceeds 75 in a pregnant woman measured at 18 weeks gestation? In other words, what
is P(X > 75)?
Activity 15.3: In a sample of 50 women, what is the probability that their mean AFP exceeds 75? In other
words, what is P(X > 75)?
Page 8 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >
Note: that the first part of the question addresses the probability of observing a single woman with an AFP
exceeding 75, whereas the second part of the question addresses the probability that the mean AFP in a
sample of 50 women exceeds 75.
Page 9 of 9