Normalcurvegrade11 Final
Normalcurvegrade11 Final
Anthony J Greene 4
A Normal Distribution:
Age At Retirement
2000
1800
1600
1400
1200
1000
800
600
400
200
0
31-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 76-80 81-85 86-90 91-95 96-100
5
Normally Distributed Variables
6
Probability and the Normal
Distribution
Probability is the Underlying Cause of the
Normal Distribution
7
Possible outcomes
for four coin tosses
HHHH HHHT HHTH HHTT
HTHH HTHT HTTH HTTT
THHH THHT THTH THTT
TTHH TTHT TTTH TTTT
10
Another Example
2 Dice
Possible outcomes:
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
11
Another Example
x f (x)
2 1
3 2 7
6
4 3 5
5 4 4
3
6 5
2
7 6 1
0
8 5 2 3 4 5 6 7 8 9 10 11 12
9 4
10 3
11 2
12 1 12
Examples of the Normal Distribution
• Age • Calories Eaten per Day
• Height • Hours of Work Done per
• Weight Day
• I.Q. • Eyeblinks per Hour
• Sick Days per Year • Insulting Remarks per
• Hours Sleep per Night Week
• Words Read per Minute • Number of Pairs of
Socks Owned
13
The most important distribution in statistical science is
a normal distribution, which has a "bell-shaped" curve.
There are many reasons why the normal distribution is
considered the most important curve in statistics.
(a) Many random variables are either normally
distributed or, at least, approximately normally
distributed. Heights, weights, examination scores, the
log of the length of life of some equipment are among
a few random variables that are approximately
normally distributed. Although the distributions are
only approximately normal, the approximation is
usually quite close.
(b) It is easy for mathematical statisticians to
work with the normal curve. A number of
hypothesis tests and the regression model are
based on the assumption that the underlying
data have normal distributions.
(b) It is easy for mathematical statisticians to
work with the normal curve. A number of
hypothesis tests and the regression model are
based on the assumption that the underlying
data have normal distributions.
17
The graph of the normal distribution depends on
two factors: (1) the mean and (2)the standard
deviation σ.
In fact, the mean and standard deviation
characterize the whole distribution. That is, we can
get areas under the normal curve given information
about the mean and standard deviation.
The mean determines the location of the center of
the bell shaped curve. Thus, a change in the value
of the mean shifts the graph of the normal curve
to the right or to the left.
The standard deviation determines the shape
of the graphs (particularly, the height and
width of the curve). When the standard
deviation is large, the normal curve is short
and wide, while a small value for the standard
deviation yields a skinnier and taller graph.
Shapes of the Normal Distribution
Kurtosis
• Leptokurtic – more peaked
• Platykurtic - flat-topped curves
• Mesokurtic - normal/ordinary
Anthony J Greene 21
Every normal curve (regardless of its mean or
standard deviation) conforms to the following
"empirical rule" (also called the 68-95-99.7
rule):
• About 68% of the area under the curve falls
within 1 standard deviation of the mean.
• About 95% of the area under the curve falls
within 2 standard deviations of the mean.
• Nearly the entire distribution (About 99.7% of
the area under the curve) falls within 3 standard
deviations of the mean.
"empirical rule" (68-95-99.7 rule):
• About 68% of the area under the curve falls
within 1 standard deviation of the mean.
"empirical rule" (68-95-99.7 rule):
• About 95% of the area under the curve falls
within 2 standard deviations of the mean.
"empirical rule" (68-95-99.7 rule):
• Nearly the entire distribution (About 99.7% of
the area under the curve) falls within 3 standard
deviations of the mean.
• Determine what frequency and relative
frequency of babies weights are within
a. one standard deviation from the mean
b. Two standard deviations from the mean
c. Three standard deviations from the mean
36 x 0.68 = 24.48
• Determine what frequency and relative
frequency of babies weights are within
b. two standard deviations from the mean
4.94 4.69 5.16 7.29 7.19 9.47 6.61 5.84 6.83
3.45 2.93 6.38 4.38 6.76 9.01 8.47 6.8 6.4
8.6 3.99 7.68 2.24 5.32 6.24 6.19 5.63 5.37
5.26 7.35 6.11 7.34 5.87 6.56 6.18 7.35 4.21
36 x 0.95 = 34.2
• Determine what frequency and relative
frequency of babies weights are within
c. three standard deviations from the mean
4.94 4.69 5.16 7.29 7.19 9.47 6.61 5.84 6.83
3.45 2.93 6.38 4.38 6.76 9.01 8.47 6.8 6.4
8.6 3.99 7.68 2.24 5.32 6.24 6.19 5.63 5.37
5.26 7.35 6.11 7.34 5.87 6.56 6.18 7.35 4.21
36 x 0.997 = 35.892
What do we do with Normal Distributions?
30
Comparing Two Distributions
35
Height a) in inches b) in centimeters
inches X 2.54 = centimeters
0.16 0.16
0.14 0.14
0.12 0.12
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
56 60 64 68 72 76 80 142 152 162 172 182 192 202
36
Transformations
37
Standardized Normally Distributed
Variable
x
z
38
Standardizing normal distributions
39
Standard Normal Distributions
• The z-score transformation is entirely
reversible but allows any distribution to be
compared (e.g., I.Q. and SAT score; does a top
I.Q. score correspond to a top SAT score?)
• z-scores all have a mean of zero and a
standard deviation of 1, which gives them the
simplest possible mathematical properties.
Standard Normal Distributions
An example of a z transformation from a
variable (x) with mean 3 and standard deviation
2
Understanding x and z-scores
42
Basic Properties of the Standard
Normal Curve
Property 1: The total area under the standard normal curve
is equal to 1.
Property 2: The standard normal curve extends indefinitely
in both directions, approaching, but never touching, the
horizontal axis as it does so.
Property 3: The standard normal curve is symmetric about
0; that is, the left side of the curve should be a mirror image
of the right side of the curve.
Property 4: Most of the area under the standard normal
curve lies between –3 and 3.
The Normal Distribution:
why use a table?
x2
1 d
( X ) / 2
P
2 2
e
x1 2 2 dx
Finding percentages for a normally
distributed variable from areas under
the standard normal curve
0 1.6 z
P(Z > 1.6) = 1 – P(Z < 1.6)
= 1 – 0.9452
=0.0548
• What is P(Z < 1.52) ? P(Z < 1.52)
0 1.52
0 0.9 1.9
P(0.9 < Z < 1.9) = P( Z < 1.9) – P (Z < 0.9)
=0.9713 – 0.8159
= 0.1554
Find the probabilities for each using the
standard normal distributions.
1. P(0 < z < 1.65) 0.4505 or 45.05%
2. P(-2.3 < z < 0) 0.4893 or 48.93%
3. P (z > 0.56) 0.2877 or 28.77%
4. P(z < -1.8) 0.0359 or 3.59%
5. P(-2.3 < z < 0.79) 0.7745 or 77.45%
The z
Notation
1- 0.025 = 0.975
The z corresponding to 0.9750 in the left tail is 1.96.
The z corresponding to 0.025 in the right tail is 1.96
Finding z 0.05
𝑧0.05 = 1.645
1- 0.05 = 0.95
1. 3.
z = 2.13 z = -2.13
2. 4.
z = -1.32 z = -1.26
Find the z value that corresponds to the given area.
5. 7.
z = 1.98 z = -1.91
6. 8.
z = 1.84 z = 0.56
Finding the two z-scores dividing the area
under the standard normal curve into a
middle 0.95 area and two outside 0.025
areas
1. , are given.
2. a and b are any two values of the variable x.
3. Compute z-scores for a and b.
4. Consult the z distribution table.
5. Perform the necessary operation.
Given that a quiz has a mean score
of 14 and an s.d. of 3, what
proportion of the class will score
between 9 & 16?
1. = 14 and = 3.
2. a = 9 and b = 16.
9−14 −5
3. za = = = −1.67
3 3
16−14 2
zb = = = 0.67
3 3
Given that a quiz has a mean score
of 14 and an s.d. of 3, what
proportion of the class will score
between 9 & 16?
“Just over half the time, 53% or so, a computer will have
an assembly time between 45 minutes and 1 hour”
8.73
Example
• The return on investment is normally distributed with a
mean of 10 and a standard deviation of 5. What is the
probability of losing money?
X 0 10
P( X 0) P
10
P( Z 1)
0.5 P(0 Z 1)
0.5 0.3413
0.1587
• Thus, increasing the standard deviation increases the
probability of losing money, which is why the standard
deviation (or the variance) is a measure of risk.
8.75
In a job fair, 3 000 applicants applied for a job.
Their mean age was found to be 28 with a
standard deviation of 4 years.