CH 6 - CH 8 (Notes)
CH 6 - CH 8 (Notes)
STATIS
TICS
NOTES
CHAPT
ER 6-
CHAPT
ER 8
Chapter 6
The Normal Distribution
1
CH6.1 Introducing Normally Distributed Variables
What is a distribution?
Distribution of a Data Set: The distribution of a data set is a table, graph, or formula
that provides the values of the observations and how often they occur.
2
0.4
Relative Frequency
0.3
0.2
0.1
0.0
-4 -3 -2 -1 0 1 2 3 4
Normal data
0.4
0.3
mean=0,sigma=1
0.2
0.1
0.0
-4 -3 -2 -1 0 1 2 3 4
Normal Curve
Terminology:
If a variable of a population is normally distributed and is the only variable
under consideration, we say that the population is normally distributed or that it
is a normally distributed population.
More commonly:
If a variable’s distribution is shaped roughly like a normal curve, we say that
the variable is an approximately normally distributed variable or that it has
approximately a normal distribution.
The normal curve has 2 parameters, μ and σ . The mean of the normal distribution is μ
and its standard deviation is σ .
Is bell-shaped
Is centered at μ.
Close to the x-axis outside the range from μ−3 σ ¿ μ+3 σ .
4
What’s your best guess for σ (the standard deviation)?
5
6
Key Fact: For a normally distributed variable, the percentage of all possible
observations that lie within any specified range equals the corresponding area
under its associated normal curve, expressed as a percentage. The result holds
approximately for a variable that is approximately normally distributed.
For any normal distribution, the mean and standard deviation completely determine
the curve. To avoid needing a different table for each normal curve (i.e. each mean
and standard deviation), we standardize our normally distributed variable.
7
Plan:
1. Learn to calculate specified areas under a Standard Normal Curve with
μ=0 , σ=1
2. To calculate specified areas under any normal curve, convert to z scores. Since the z
score has a Standard Normal Distribution, we can use the Normal Table to calculate
area and translate the information to the general normal variable.
8
CH 6.2- Areas Under the Standard Normal Curve
Basic Properties of the Standard Normal Curve
Example: To find the area under the standard normal curve that lies to the left of
1.23:
9
Finding the Area to the Right of a Specified Value
Example: To determine the area under the standard normal curve that lies to the
right of 0.76.
We want
We correct by
10
Example: Find the area between -0.51 and 1.87
Direct
lookup
1-Direct
lookup
2 Direct
lookups and
a subtraction
? ? ?
Z Z Z1 Z2
A direct look up 1 – “direct 2 “direct lookups’
lookup” and subtract
11
Important Areas
Almost 100% (99.74%) of the curve lies between -3 and +3 standard deviations.
The area to the left of -3 is equal to 0.0013 or 0.13% and the area to the right of
+3 is equal to 0.0013 or 0.13%.
The z-score such that the area to the left of the z-score is 0.7157 is z=0.57
Example: To determine the z-value for which the area under the standard normal
curve to the left is 0.04.
1. Draw a diagram.
2. Look up in the body of Table II the value of 0.04. (We are looking for area).
Notice that the closest value in the table to 0.04 is 0.0401. Use this value.
3. Look in the column beside 0.0401 to find the corresponding z value. Here we
find that z = -1.75.
Example: To find the Z-score such that the area to the right of the z-score is 0.3021.
13
1. Find the area to the left of the z-score. It is 1-0.3021 = 0.6979.
2. Look up in the body of Table II to find the closest number to 0.6979, which is
0.6985.
3. Find the z-Score corresponding to this number. It is 0.52.
Z α is used to denote the z-score having the area of α to the right under the
standard normal curve.
We are looking for the z-value such that the area to the right of the z-value is 0.25.
This means that the area left of the z-value is 0.75.
14
Example: Find the z-scores that separate the middle 80% of the area under the normal
curve from the 20% in the tails.
z1 is the z-score such that the area to the left is 0.1, so z1 = -1.28.
Exercise: Find 2 z-scores that divide the area under the standard normal curve into a
middle 0.95 area and two outside 0.025 areas. Draw a picture first.
15
Determining the Percentage or Probability for a Normally Distributed Variable
For a general normal random variable X with mean μ and standard deviation σ,
X−μ
the variable Z ( Z= σ ) has a standard normal probability distribution.
We can use this relationship to perform calculations for X
Values of X àß Values of Z
x−μ
If x is a value for X, then z= σ is a value for Z
This is a very useful relationship. Because of this relationship,
To find P(X < x) for a general normal random variable, we could calculate
X−μ
Probabilities for X are directly related to probabilities for Z using, Z= σ .
Therefore, P(X<4)=P(Z<0.5).
x−μ
z=
σ
Use table II to solve the problem for the standard normal Z,
The answer will be the same for the general normal X.
μ=3
σ=2
μ = –2
σ=4
17
μ=6
σ=4
Exercise: IQs are normally distributed with a mean of 100 and a standard deviation of
16. What percentage of people have IQs between 115 and 140?
18
The 68.26 – 95.44 – 99.74 rule.
1. 68.26% of all possible observations lie within one standard deviation to either
side of the mean, that is between μ−σ and μ+σ
2. 95.44% of all possible observations lie within two standard deviation to either
Exercise: Consider IQs with a mean of 100 and a standard deviation of 16. Show the
68.26 – 95.44 – 99.74 rule for this variable.
19
Exercise: Obtain the 90th percentile for IQs.
1. Sketch.
2. Shade.
μ = –2
σ=4
find the value x such that P(X > x) = 0.2
20
CH 6.4- Assessing Normality: Normal Probability Plots
Many real world variables have bell shaped histograms, so we would say that
they should or could have normal probability distributions
We need methods to assess whether this is a good assumption or not
The main method used to assess whether sample data is approximately normal
is the normal probability plot.
This plot graphs the observed data, ranked in ascending order, against the
“expected” Z-score of that rank
The chart compares
o The lowest observed value with where it is expected to be (according to
the normal)
o The second lowest observed value with where it is expected to be
(according to the normal)
.
.
.
o The highest observed value with where it is expected to be (according to
the normal)
The expected lowest value, the expected second lowest value, etc. are not easy
to derive
Technology should be used to construct these graphs
If the sample data was taken from a normal random variable, then this plot
should be approximately linear
21
Probability Plot of Other pain
Normal - 95% CI
99
Mean 4.44
StDev 2.128
95 N 25
AD 0.362
90
P-Value 0.416
80
70
Percent
60
50
40
30
20
10
1
0 3 6 9 12
Other pain
60
50
40
30
20
10
1
-5 0 5 10 15
Other Infection
Example: Both of these show that this particular data set is far from having a normal
distribution
We can assess whether sample data is approximately normal by using the normal
probability plot.
22
Chapter 7
The Sampling Distribution of Sample Mean
CH 7.1-Distribution of the Sample Mean
Often the population is too large to perform a census … so we take a sample
How do the results of the sample apply to the population?
What’s the relationship between the sample mean and the population mean?
What’s the relationship between the sample standard deviation and the
population standard deviation?
This is statistical inference
Example: If we want to estimate the heights of eight year old girls, we can proceed as
follows:
Randomly select 100 eight year old girls
Compute the sample mean of the 100 heights
Use that as our estimate
This is using the sample mean to estimate the population mean
However,
if we take a series of different random samples of size 100
Sample 1 – we compute sample mean x1
Sample 2 – we compute sample mean x 2
Sample 3 – we compute sample mean x 3
Etc.
Each time we take a sample, we may get a different result
23
http://opl.apa.org/contributions/Rice/rvls_sim/stat_sim/sampling_dist/index.html
CH 7.2-The Mean and Standard Deviation of the Sample Mean
Moral:
1. The center or mean of all the distribution remains the same.
2. As the sample size increases, the standard deviation of distribution of X
decreases.
Results:
Let x be a random variable with a mean μ and standard deviationσ .
and
standard deviation
σ
σ x́ =
√n
(this is referred to as the standard error of the mean or just the standard error)
GIVEN RESULT
Parent Distribution Distribution of the Sampling
Mean
Parameters
Mean Standard Sample Mean Standard Deviation
Deviation size μ x́ σ x́
µ 𝜎 n µ 𝜎/√ n
of x .
Exercise:
26
Consider a random variable x that has a mean of 50 and a standard deviation of 8.
If a sample of size 64 is taken, what is the mean and standard deviation of x ?
Exercise:
Consider a random variable x that has a mean of 50 and a standard deviation of 8.
If a sample of size 10 is taken:
What is the mean and standard deviation of X ?
27
Chapter 8
Confidence Intervals for One Population Mean
CH 8.1-Estimating a Population Mean
From chapter 7, we know that for large sample sizes (n ≥ 30) has anx
approximately normal distribution with a mean, μ and standard deviation,
σ
σ X=
√n
( μ -1
σX , μ +1
σX ) has .6826 of the area
( μ -2
σX , μ +2
σX ) has .9544 of the area
( μ -3
σX , μ +3 σX ) has .9974 of the area
A confidence interval is an estimate of intervals like the .6826, .9544, .9974 intervals.
For example, suppose we know that x is distributed with a normal distribution but the
mean, μ is unknown. The standard deviation is known to be 2. ( σ =2 )
A sample of size 4 is taken and the sample mean is calculated.
To estimate μ we use x .
28
σ 2
σ X= =
Standard deviation of x = √n √4 = 1
29
when x = 42.28,
(42.28 - 2.4, 42.28 + 2.4) = (39.88, 44.68)
2. If we repeat the experiment a large number of times (many samples of size 36) and
each time we construct a 95.44% confidence interval, then we expect that 95.44% of
the time the confidence interval contains the population mean.
Exercise: Suppose x = 45.1, what would the 95.44 % confidence interval for μ
be?
We can calculate confidence intervals for confidence levels equal to 68.26%, 95.44%,
or 99.74%. Now for any confidence level!
30
CH 8.2-Confidence Intervals for One Population Mean When σ is
known
Notation:
Confidence level = 1- α
So α = 1 – confidence level.
Confidence α
α Picture Zα
level 2 2
31
0.95 0.05 0.025
Zα
α
Step 1 For a confidence level of 1 - , use Normal Tables II to find 2 .
Step 2 The confidence interval for μ is from
σ σ
x−z α⋅ x+ z α⋅
2 √n to 2 √n
Zα
where 2 is found in Step 1, n is the sample size, and x is computed from the
sample data.
Note: The C.I. is exact for normal populations and is approximately correct for large
samples from non-normal populations.
n greater than or equal to 30 or if n less than 30 we need to check that the data
appears to be normal.
When σ is known.
Always graph your data. Only use a procedure that is appropriate for your data.
32
Example: Consider the ages of 50 randomly selected people with a population
standard deviation of 12.1 years and a sample mean of 36.4 years; find the 95%
confidence interval for the population mean of their ages.
Zα
C.L. = .95 α =.05 α /2 = .025 2 =
Z 0 .025 =1.96
Or
(33.0, 39.8)
Step 3:
We are 95% confident that the population mean falls in the interval (33.0, 39.8)
33
If we repeat the experiment a large number of times (many samples of size 50) and
each time we construct a 95% confidence interval, then we would expect that 95% of
the time the confidence interval contains the population mean.
Example: A sample of size 49 is taken from a class. The sample average height is 64
inches. The population standard deviation for heights is known to be 1.6. What is the
point estimate of μ. Calculate the 90% confidence interval for μ. Why is the technique
you used valid?
34
CH 8.3 - Margin of Error
Margin of Error for estimated μ is
σ
z α⋅
E= 2 √n
The confidence interval has the form
x−E to x+ E
We note that the Margin of Error is equal to ½ the length of the confidence interval.
For a fixed sample size, increasing the confidence level increases the width
of the interval, and vice-versa.
Zα
C.L. = .90 α =.10 α /2 = .05 2 =
Z 0 .05 = 1.645
Zα
C.L. = .99 α =.01 α /2 = .005 2 =
Z 0 .005 =
z α⋅
σ Zα
Since the margin of error is E= 2 √n . Increasing 2 increases the width of
the interval.
For a fixed confidence level, increasing the sample size decreases the width of
the interval.
35
Suppose the C.L. = .99 and σ =1.5. Let’s see what happens to the margin of error
as n increases from 10 to 100.
σ
z α⋅
E= 2 √n (half the width of the interval)
Zα
= .99 margin
α =.01 αby/2changing
= .005
Z
Example: C.L.
Comparing of errors 2 = size.
the sample 0 .005 =
For n = 10
1 .5 1 .5
2. 58⋅ 2. 58⋅ =1. 22
E= √ 10 = 3 .16
For n = 100
1 .5 1. 5
2. 58⋅ 2. 58⋅ =0 . 387
E= √ 100 = 10
So,
Increasing the sample size decreases the margin of error and hence decreases the
width of the interval.
Given
Known: σ
Margin of error: E
Confidence level: 1 - α
Find
The sample size
36
σ
z α⋅
Since E= 2 √n
Solve n and we get
2
z α⋅σ
n= ( ) 2
E
Round up.
Example: Age of Civilian Labor Force
Determine the sample size required to ensure that we can be 95% confident that μ
is within 0.5 years of the estimate X . σ is known to be 12.2 years.
Given
Known: σ = 12.2 years
Find
The sample size
2
z α⋅σ
n= ( ) 2
E
Zα
C.L. = .95 α =.05 α /2 = .025 2 =
Z 0 .025
2
1. 96⋅12. 2
n= ( 0 .5 ) = 2287.13 or 2288 (always round up)
37
CH 8.4- Confidence Intervals for One Population Mean
When σ is Unknown.
x−μ
z=
When σ is known, the standardized version of x , σ /√n has a standard
normal distribution.
x−μ
t=
s/ √n
t does not have a normal distribution, it has a t-distribution with n-1 degrees of
freedom.
http://www.bilkent.edu.tr/~ktarik/econ222/TDist.html
http://media.pearsoncmg.com/ph/esm/statistics_datasets/sullivan_funstats2e/sfs2e_tab
le.pdf
Example:
th
For 15 degrees of freedom, find the t 0. 05=95 percentile of the t-distribution.
th
t =99
For 28 degrees of freedom, find the 0. 01 percentile of the t-distribution.
Assumptions:
1. Normal population or large sample
2. σ unknown
s s
x−t α /2⋅ x+ t α /2⋅
√n to √n
39
where s and x are calculated from the sample data.
68.21 to 84.19
40
Interpretation: We are 95% certain that the population mean of the test scores is
between (68.2 to 84.2).
If we repeat the experiment a large number of times (many samples of size 30) and
each time we construct a 95% confidence interval, then we expect that 95% of the
time the confidence interval will contain the population mean.
Example: In 1908, W. S. Gosset published the article “The Probable Error of the
Mean” (Biometrika, Vol 6, pp. 1- 25). In this pioneering paper, written under the
pseudonym “Student,” Gossett introduced what later became known at Student’s t-
distribution. Gosset used the following data set, which gives the additional sleep
obtained by a sample of 10 patients using laevohysocyamine hydrobromide.
Preliminary data analyses indicate that it is reasonable to assume the data was
generated by a normal process.
Probability Plot of C1
Normal - 95% CI
99
Mean 2.33
StDev 2.002
95 N 10
AD 0.357
90
P-Value 0.378
80
70
Percent
60
50
40
30
20
10
1
-5.0 -2.5 0.0 2.5 5.0 7.5 10.0
C1
Find a 95% confidence interval for the additional sleep that would be obtained on
average for all people using laevohysocyamine hydrobromide.
41
Was the drug effective in increasing sleep?
42