0% found this document useful (0 votes)
25 views9 pages

STA 241 Topic 14 Laws of Large Numbers (Corr)

This document covers Topic 15 on the Laws of Large Numbers and the Central Limit Theorem in the STA 241 course. It includes objectives for learners, explanations of Chebyshev's inequality, the weak and strong laws of large numbers, and the central limit theorem, along with proofs and practical applications. The document also provides examples and exercises to reinforce understanding of these statistical concepts.

Uploaded by

justus khisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views9 pages

STA 241 Topic 14 Laws of Large Numbers (Corr)

This document covers Topic 15 on the Laws of Large Numbers and the Central Limit Theorem in the STA 241 course. It includes objectives for learners, explanations of Chebyshev's inequality, the weak and strong laws of large numbers, and the central limit theorem, along with proofs and practical applications. The document also provides examples and exercises to reinforce understanding of these statistical concepts.

Uploaded by

justus khisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Course Code: STA 241

Course Title: PROBABILITY AND DISTRIBUTION MODELS

http://ecampus.mmust.ac.ke
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

Topic 15: LAWS OF LARGE NUMBERS AND CENTRAL LIMIT THEOREM.


Introduction
In this topic, we examine the importance of the central limit theorem and its applications.
The Central limit theorem makes us make inferences about the sample statistics and population
parameters. The theorem can tell us whether a sample belongs to a given population or not.

Objectives
By the end of this topic, learner should be able to:
1. State and prove Chebyshev's inequality
2. Obtain the weak and strong laws of large numbers
3. Derive the central limit theorem.
4. Solve practical (real life) problems using the knowledge of the central limit theorem

Learning Activities

 Students to take note of the activities and exercises provided within the text and at the end of the
topic.

Topic Resources

 Students to take note of the reference text books provided in the course outline.
 Learners to get e-learning materials from MMUST library and other links within their reach.

Page 2 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

15.0: LAWS OF LARGE NUMBERS AND CENTRAL LIMIT THEOREM


15.1 Chebyshev’s inequality
Let X be a random variable with mean E  X    , variance Var  X    2 and   0 . Then, Chebyshev’s
inequality states that
2
P  X       2 for any   0 .

Other equivalent forms can be written for the inequality, by simple manipulation is:
P  X    n  
1
n2
Proof of Chebyshev’s inequality:

We need to show that


2
P X      
2

Var  X    2   t    f X  t  dt
2

u  
 t    f X  t  dt   t    f X  t  dt where the last line is by restricting the region
2 2
 u 

over which we integrate a positive function.

Then this is
u  
  2 f X  t  dt    2 f X  t  dt
 u 

Since, t    
  t

  2  t   
2

Using the density function, we have that


2  u 


f X  t  dt  
u 

f X  t  dt 
    t  dt    P  X     or X        P  X     
u  
2
f X  t  dt   fX 2 2
 u 

Thus,  2   2 P  X     
Dividing by  2
2
 P X     
2
2
 P X       2

Hence the proof!
Page 3 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

15.2 Law of Large Numbers


 The law of large numbers has a very central role in probability and statistics.
 It states that, if you repeat an experiment independently a large number of times and average the result,
what you obtain should be close to the expected value.
 There are two main versions of the law of large numbers. They are called the
a) Weak laws of the large numbers (WLLN) and
b) Strong laws of the large numbers (SLLN).
 Before discussing the WLLN, let us define the sample mean

Definition 15.1: For i.i.d. random X 1 , X 2 ,..., X n , the sample mean, denoted by X , is defined as
X 1  ....  X n
X
n
 X  ....  X n  E  X 1   ....  E  X n    ....   n
EX   E 1    
 n  n n n
Also,
 X  ....  X n  Var  X 1   ....  Var  X n    ....   n 2  2
2 2
Var  X   Var  1     
 n  n2 n2 n2 n

15.3 The weak law of large numbers (WLLN)


Theorem 15.1: Let X 1 , X 2 ,..., X n be i.i.d. random variables with finite expected value E  X      .


Then for any   0 , the lim P X      0 
Proof:
Var  X  Var  X 

P X     2

n 2
This goes to zero as n  

Activity 15.1: Read and make notes on the strong laws of the large numbers (SLLN).

15.4 The Central Limit Theorem (CLT)


 The central limit theorem (CLT) is one of the most important results in probability theory. It states that,
under certain conditions, the sum of a large number of random variables is approximately normal. Here,
we state a version of the CLT that applies to i.i.d. random variables.
 Suppose that X 1 ,..., X n are i.i.d. random variables with expected values E  X      , and variance
Var  X    2   .

Page 4 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

X 1  ....  X n 2
 Then the sample mean X  has mean E  X      and variance Var  X  
n n
X  n X  
Thus, the normalized random variable Z n  
2
n

Z n has mean E  Z n   0 and variance Var  Z n   1
The central limit theorem states that the CDF of Z n converges to the standard normal CDF.

Theorem 15.2: Let X 1 , X 2 ,..., X n be i.i.d. random variables with finite expected value E  X      and

X  n X  
variance: 0  Var  X    2   . Then, the random variable Z n   converges in
2
n

distribution to the standard normal random variable as n goes to infinity, that is
lim n P  Z n  x     x  ; x  , where   x  is the standard normal CDF.

Remark 15.1: An interesting thing about the CLT is that, it does not matter what the distribution of the Xi's
is. The Xi's can be discrete, continuous, or mixed random variables.

Theorem 15.3: Let X 1 , X 2 ,..., X n be i.i.d Bernoulli random variable with parameter p . Then,
E  X   p and Var  X   p 1  p   pq . Also, Yn  X 1  ...  X n has a binomial random variable with
X  Yn  np
parameters n and p , i.e. Y ~ B  n, p  . This implies that, Z n   , where Yn ~ Bin  n, p  .
 np 1  p 
2

Illustration 15.1: How to apply the Central Limit Theorem (CLT)


Here are the steps that we need in order to apply the CLT:
1) Write the random variable of interest, Y, as the sum of n i.i.d. random variable Xi's:
Yn  X 1  ...  X n
2) Find E Y  and Var Y  by noting that;
E Y   n And Var Y   n 2 , where E  X i    and Var  X i    2
Y  E Y  Y  n
3) According to the CLT, conclude that Z n   is approximately standard normal.
Var Y  n

4) Thus, to find P  y1  Y  y2  , we can write


 y  n y  n 
P  y1  Y  y2   P  1 Z 2 
 n n 

Page 5 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

 y  n   y1  n 
  2   
 n   n 
Example 15.1: A bank teller serves customers standing in the queue one by one. Suppose that the service
time Xi for customer i has mean E  X i     2 minutes and Var  X i    2  1 . We assume that, service
times for different bank customers are independent. Let Y be the total time the bank teller spends
serving 50 customers. Find P  90  Y  110 

Solution:
Let Yn  X 1  ...  X n , n  50 , E  X i     2 and Var  X i    2  1 .
 90  n Y  n 110  n 
P  90  Y  110   P    
 n n n 
 90  100 110  100 
 P
 50
Z
50 
P  2Z  2  
  2      2   0.8427
Example 15.2: In a communication system, each data packet consists of 1000 bits. Due to the noise, each bit
may be received in error with probability 0.1. It is assumed bit errors occur independently. Find the
probability that there are more than 120 errors in a certain data packet.

Solution:
Let us define Xi as the indicator variable for the ith bit in the packet. That is, the ith bit is, X i  1 if the ith bit
is received in error, and X i  0 otherwise. Then the Xi are i.i.d. and X i ~ Bernoulli  p  0.1
If Y is the total number of bits in the packet, we have, Yn  X 1  ...  X n
Since X i ~ Bernoulli  p  0.1 , we have n  1000 , E  X i     p  0.1 and
Var  X i    2  p 1  p   0.09
 Y  n 120  n 
Thus, P Y  120   P   
 n n 
 120  100   20 
 P  Zn    1     0.0175
 90   90 
X 1  ....  X n
Exercise 15.1: Let X 1 , X 2 ,..., X n be i.i.d. Exp    random variable with   1 . Let X  . How
n
large should n be such that P  0.9  X  1.1  0.95 .

Page 6 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

Exercise 15.2: Data from the Framingham Heart Study found that subjects over age 50 had a mean HDL of
54 and a standard deviation of 17. Suppose a physician has 40 patients over age 50 and wants to determine
the probability that the mean HDL cholesterol for this sample of 40 men is 60 mg/dl or more (i.e., low risk).
Probability questions about a sample mean can be addressed with the Central Limit Theorem, as long as the
sample size is sufficiently large. In this case n=40, so the sample mean is likely to be approximately
normally distributed, so we can compute the probability of HDL>60 by using the standard normal
distribution table.

The population mean is 54, but the question is what is the probability that the sample mean will be >60?

In general,

the standard deviation of the sample mean is

Therefore, the formula to standardize a sample mean is:

And in this case:

P(Z > 2.22) can be looked up in the standard normal distribution table, and because we want the probability
that P(Z > 2.22), we compute is as P(Z > 2.22) = 1 - 0.9868 = 0.0132.

Therefore, the probability that the mean HDL in these 40 patients will exceed 60 is 1.32%.

i) What is the probability that the mean HDL cholesterol among these 40 patients is less than 50?

Solution

Because AFP is normally distributed, we standardize P(X > 75) = P(Z > (75-58) / 18 = (17 / 18) = 0.84.

From the standardized normal distribution table, P(X>75 = P(Z.0.94) = 1 - 0.8264 = 0.1736.

Page 7 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

Therefore, there is a 17% probability that AFP exceeds 75 in a pregnant woman measured at 18 weeks
gestation.

Example:15.3 Suppose we want to estimate the mean LDL cholesterol in the population of adults 65 years
of age and older. We know from studies of adults under age 65 that the standard deviation is 13, and we will
assume that the variability in LDL in adults 65 years of age and older is the same. We will select a sample of
n=100 participants > 65 years of age, and we will use the mean of the sample as an estimate of the
population mean. We want our estimate to be precise; specifically we want it to be within 3 units of the true
mean LDL value. What is the probability that our estimate (i.e., the sample mean) will be within 3 units of
the true mean? We think of this question as P(μ - 3 < sample mean < μ + 3).

Because this is a probability about a sample mean, we will use the Central Limit Theorem. With a sample of
size n=100 we clearly satisfy the sample size criterion so we can use the Central Limit Theorem and the
standard normal distribution table. The previous questions focused on specific values of the sample mean
(e.g., 50 or 60) and we converted those to Z scores and used the standard normal distribution table to find
the probabilities. Here the values of interest are μ - 3 and μ + 3. The solution can be set up as follows:

From the standard normal distribution table P(Z < 2.31) = 0.98956, and a P(Z < -2.31) = 0.01044. The range
between these two = P(-2.31 < Z < 2.31) = 0.98956 - 0.01044 = 0.9791. Therefore, there is a 97.91%
probability that the sample mean, based on a sample of size n=100, will be within 3 units of the true
population mean. This is a very powerful statement, because it means that for this question looking only at
100 individuals aged 65 or older gives us a very precise estimate of the population mean.

Activity 15.2: Alpha fetoprotein (AFP) is a substance produced by a fetus that can be measured in pregnant
woman to assess the probability of problems with fetal development. When measured at 15-20 weeks
gestation, AFP is normally distributed with a mean of 58 and a standard deviation of 18. What is the
probability that AFP exceeds 75 in a pregnant woman measured at 18 weeks gestation? In other words, what
is P(X > 75)?

Activity 15.3: In a sample of 50 women, what is the probability that their mean AFP exceeds 75? In other
words, what is P(X > 75)?

Page 8 of 9
TOPIC 15: < LAWS OF LARGE NUMBERS
AND CENTRAL LIMIT THEOREM >

Note: that the first part of the question addresses the probability of observing a single woman with an AFP
exceeding 75, whereas the second part of the question addresses the probability that the mean AFP in a
sample of 50 women exceeds 75.

Page 9 of 9

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy