Stab22 Lecture8
Stab22 Lecture8
The probability of any event is the area under the density curve and above the values
of X that make up the event.
Normal Distributions
Let's look at the examples from the previous lecture:
Example: binomial distribution, n = 500, p = 0.4
X is approximately √ ))
)
̂ is approximately √ )
As a rule of thumb, we will use this approximation for values of n and p that satisfy
and ) .
Example: Let's compare the Normal approximation with the exact calculation.
), ̂ ) (software)
Example: The audit described in the example from the previous lecture
examined an SRS of 150 sales records for compliance with sales tax laws. In fact, 8% of
all the company's sales records have an incorrect sales tax classification. The count X of
bad records in the sample has approximately the Bin(150, 0.08) distribution.
) (software)
Continuity Correction
Figure below illustrates an idea that greatly improves the accuracy of the Normal
approximation to binomial probabilities.
Sampling Distribution of Sample Mean
Sample means are among the most common statistics, and we are often interested in their
sampling distribution.
The figure below shows (a) the distribution of lengths of all customer service calls
received by a bank in a month ( ); (b) the distribution of the sample means ̅ for
500 random samples of size 80 from this population.
Facts about sample means:
̅ ∑
Then
̅
̅
√
Why?
Central Limit Theorem (CLT)
We have described the center and spread of the probability distribution of ̅ . What about
its shape?
It can be shown that if we have a population from
)
then the distribution of sample mean of n independent observations is
)
√
Moreover,
CLT: Draw an SRS of size n from any population with mean and finite standard
deviation . When n is large enough, the sampling distribution of the sample mean is
approximately ).
√
Example: Let's look at the histogram of the lengths of telephone calls again. For that
example, and seconds. Consider a sample of size 80.
The mean time is hour and the standard deviation is hour. Your company
operates 70 of these units. What is the probability that their average maintenance time
exceeds 50 minutes?
Let ̅ = sample mean time spent working on 70 units.
Actual probability = 0.9294
A few more facts:
The Normal approximation for sample proportions and counts is an example of the
CLT.
Why?
If they play independently, what is the probability that Tom will score lower than George
and thus do better in the tournament?