Section 5.3 and 5.4
Section 5.3 and 5.4
Examples
Hypothesis Estimatio
(Covered in section 5.3 &
Testsin section 5.5
(Covered n
5.4 of chapter 5)
of chapter 5)
Point Interval
Estimate Estimate
A single value that A range of values that
estimates the estimate the
parameter. parameter.
Associated with some chance
that the parameter lies in this
interval.
The quantities p, µ and σ2 that are to be estimated are called
population parameters.
Example:
Suppose the mean time it takes to serve customers at a supermarket
checkout counter is to be estimated.
1) The mean service time of 100 customers of (say) = 2.283 minutes is
an example of a point estimate of the parameter µ.
2) If it is stated that we are 95% confident that the mean service time
will be from 1.637 minutes to 4.009 minutes, the interval of values
(1.637, 4.009) is an interval estimate of the parameter μ.
Since, from chapter 4, we have seen that a point estimate (a statistic) can
differ each time depending on the sample obtained, it is more appropriate
(and useful) to obtain an interval estimate for a parameter rather than just
one single value.
Therefore, in sections 5.3 and 5.4, we are going to focus on methods of
obtaining an interval estimate for the parameter
Some more terminology…
A confidence interval is a range of values from L (lower value) to U
(upper value) that estimates a population parameter with (1)100%
confidence.
= 2.283
(Note: The following slides show the theory behind the formula for an interval
estimate/confidence
interval for , however you do not need to know this theory.)
In order to answer the questions on the previous slide, we need to
determine a way to obtain a range of values (from a lower value to an
upper value) such that we know what the chance of a value falling into this
range is.we can use the standard normal distribution to obtain this:
Therefore,
𝑧𝛼 𝑧 𝛼
1−
2
Due to symmetry, these two z- 2
( )
𝑥 −𝜇 Recall from chapter
¿𝑃 −𝑧 𝛼< < 𝑧 𝛼 =1 − 𝛼
1−
2
𝜎 1−
2 4: 𝑥 −𝜇
√𝑛 𝑍=
𝜎
√𝑛
Now we can solve for in the centre of this
expression…
( )
𝑥 −𝜇
1 −𝛼=𝑃 − 𝑧 𝛼 < <𝑧 𝛼
1−
2
𝜎 1−
2
√𝑛
𝜎
(
¿𝑃 −𝑧 𝛼×
1−
2
𝜎
√𝑛
< 𝑥 −𝜇< 𝑧 𝛼 ×
1−
2
𝜎
√𝑛 ) Multiply
by
√𝑛
(
¿ 𝑃 − 𝑥−𝑧
1−
𝛼×
2
𝜎
√𝑛
<−𝜇 <− 𝑥 + 𝑧 𝛼 ×
1−
2
𝜎
√𝑛 ) Subtract𝑥
(
∴ 1− 𝛼= 𝑃 𝑥 − 𝑧
1−
𝛼 ×
2
𝜎
√𝑛
< 𝜇< 𝑥+ 𝑧 𝛼 ×
1−
2
𝜎
√𝑛 ) Multiply
by
−1
Therefore, we can be (1)100% confident that will lie
between and .
i.e. A (1)100% confidence interval for when the value of is known, is:
( 𝑥− 𝑧
1−
𝛼
2
𝜎
√𝑛
;𝑥 +𝑧 𝛼
1−
𝜎
2 √𝑛
)
Lower confidence Upper confidence
limit (L) limit (U)
OR A (1)100% confidence interval for ( known) is:
The value of the error (E) therefore depends on the confidence percentage
() for the interval, the population standard deviation , and the size of the
sample () used to obtain the point estimate .
𝑧 𝛼
1−
2
is referred to as the z-
multiplier.
𝜎
is the standard error.
√𝑛
Example 1:
The actual content of cool drink in a 500 milliliter bottle is known to vary.
The standard deviation is known to be 5 milliliters. Thirty (30) of these 500
milliliter bottles were selected at random and their mean content found to
498.5.
Calculate 95% and 99% confidence intervals for the mean content of all the
bottles. This is the standard deviation of the amount of cool drink in
𝜎 =5
ALL the bottles, this refers to the population standard
𝑛=30 deviation.
A sample of 30 bottles were selected.
𝑛=30
Therefore, 95% confidence interval for
is:
𝑥=498.5
𝜎
𝑥± 𝑧
( )
𝛼
1−
2 √𝑛 498.5 −1.96
5
=496.71=L
√30
¿ 498.5 ± 1.96
5
√ 30( )
¿ (496.71 ; 500.29)
498.5+1.96
( √ 30 )
5
=500.29=U
(L ; U)
For a 99% confidence interval for the mean
content:
99% is the confidence
percentage
∴𝑧 𝛼 = 𝑧 0.995 =2.576
1−
𝛼 0.0 1
2
¿
Therefore, 99% confidence interval for 2 2
is:
( )
5 𝛼
¿ 498.5 ± 2.576 ∴
√30 2
𝛼
∴ 1−
¿ (496.15 ; 500.85) 2
¿ 0.995
the population mean 𝜇
Determining the Sample size when estimating
Let’s consider what values can be changed in order to obtain the smallest
error possible…
𝜎
¿𝑍
The value of the error E 𝛼 depends
1−
2 √ 𝑛on:
1) (the z-multiplier):
Therefore, out of the three values used to determine the error E, (the size
of the sample) is the best choice to change in order to get a certain error
E.
Since the larger the sample size, the smaller the value of the error E, we
would ideally like to obtain the largest possible sample in order to get the
most precise confidence interval for .
However, in reality, quite often resources are limited and it is not always
possible to obtain a very large sample (it can be very costly and time
We therefore need to choose a sample size large enough to obtain a
certain level of accuracy in our estimate but still have the sample size
small enough to be practical.
If we know beforehand (at the start of a study, before we obtain the
sample from the population to be studied) what level of accuracy we
want/need (i.e. if we know what the maximum size of the error E can be),
we can calculate what sample size should be obtained to achieve this
accuracy. 𝜎
Using the fact that the error¿ 𝑍 1− 𝛼 we can solve for :
E 2 √𝑛
√ 𝑛 E= 𝑍 1 − 𝛼 𝜎
2
𝑍 𝛼 𝜎
1−
√ 𝑛= 2
E
( )
𝑍 𝛼 𝜎
2
Note: Always round
1−
On the formula ∴ 𝑛= 2 UP to the nearest
sheet E integer value no
matter what the
decimal place is!!
• This is due to the fact that the value of E is the MAXIMUM error we
want to obtain.
• Recall: from E as increases, E decreases; and as decreases, E
increases.
• Therefore, by rounding UP to the nearest integer value (thus
increasing ), it ensures E will not be more than the stipulated
maximum value.
Example 1: 𝜇
Consider the example on the of the mean content of 500 milliliter cool
drink bottles. The standard deviation of the amount of cool drink in the
bottles is 5 milliliters. Suppose it is desired to estimate the mean content
of the bottles with 95% confidence and an error that is not greater than
0.8. What sample size is needed 𝜎to=5 achieve this accuracy?
𝑛=?
Max value of E =
∴Z 𝛼 =𝑍 0.975 =1.96
0.8
( )
2 1−
𝑍 𝛼 𝜎 2
1−
2
𝑛=
E Rounded UP! 𝛼
∴
¿( )
2 2
1.96(5) 𝛼
0.8 𝑛=150.0625 ∴ 𝑛=151 1 − ¿ 0.975
2
Example 2:
A car manufacturer would like to estimate the average fuel consumption
of their latest model (in litres per 100km). All the cars of this particular
model were designed as similarly as possible so that the standard
=?
𝜎 =0.8
deviation in the fuel consumption is only 0.8 /100km.
What sample size should be taken if the manufacturer is to be 99%
certain that the average consumption of the cars in the sample will be
within 0.3 /100km of the true average consumption of this model of
cars?
• This is specifying that the sample mean () must be 0.3 /100km within
the true value of the mean (). i.e. the distance between and must be at
most 0.3.
• Recall: from the introduction of confidence intervals, the difference
between what is obtained
max valueforoffrom
E = the sample and the true mean , is
What sample size should be taken if the manufacturer is to be 99%
certain that the average consumption of the cars in the sample will be
within 0.3 /100km of the true average consumption of this model of
cars?
𝜎 =0.8 , max value of E =
0.3
( )
2
𝑍 𝛼 𝜎
1− 𝛼
𝑛= 2
∴
E 2
𝛼
1 − ¿ 0.995
( )
2
2.576( 0.8) 2
¿ Rounded UP!
0.3 ∴Z
1−
𝛼 =𝑍 0.995 =2.576
2
𝑛=47.1877 ∴ 𝑛=48
Section 5.4: Confidence interval for the
population mean, (population variance
unknown)
𝑥 −𝜇
Recall: 𝑍= 𝑁 ( 0 , 1)
𝜎
√𝑛
• This is only true when the population standard deviation (and
therefore population variance) is known.
• But, if is unknown, it can be replaced by its sample estimate .
• However, for small sample sizes (), the expression above follows a
t-distribution with degrees of freedom instead of a standard normal
distribution (Z)
𝑥 −𝜇
i.e. when is unknown AND 𝑡= 𝑡 (𝑛 −1)
: 𝑆
√𝑛
Therefore, a (1)100% confidence interval for , when is unknown AND , is
𝑆
on the formula sheet 𝑥±𝑡 𝛼 replaces
𝑛 −1 ; 1 −
2 √𝑛
(
The z-multiplier
𝑧
) is replaced by a t-multiplier
1−
𝛼
2 ( ) 𝑡
with 𝑛− 1 ;1 −
𝛼
2
The same procedure as that from Section 5.3 is followed in order to construct
the confidence interval.
Tables for the t-
distribution:
• The values in the first column () represent the degrees of freedom (df
= ).
• The values in the top row () represents the area under the curve to the
left of a t-value that appears in the body of the table at the intersection
of the row and column entry.
• Notation: denotes the t-value that has an area of to the left where the
df for the t-distribution is .
• The t-tables differ from the standard
normal tables, where the values in the
body of the table are the t-values, with
the areas (to the left of the t-value) in
the corresponding top row.
• There are two t-tables: D1 (with
ranging from 0.900 to 0.995) and D2
(with ranging from 0.980 to 0.999).
Example
3.05
1:
If df = 12 and = 0.995: 5
∴ 𝑡 29 ; 0.975
¿ 2.045
Notice how for a very large degrees of
freedom (), the t-value for each value
of is equal to the z-value
corresponding to the same value of
e.g.
• There are only t-tables for upper percentiles (high values of ), therefore
only positive t-values can be found using the these tables.
• But due to symmetry, the area to the left of a negative t-value is equal
to the area to the right of the positive t-value (similar to the standard
normal distribution).
• Therefore, when a t-value with an area less than 0.5 to its left (i.e. <
0.5) is to be determined, the following property can be used:
i.e.
Example:
df = = 10 and = ∴ 𝑡 10 ;0.10 =¿−𝑡 10 ;1 −0.10¿ −𝑡 10 ; 0.90¿ −1.372
0.10
• Since < 0.5, the t-value that has this area of 0.10 to its left will be
negative.
• This t-value cannot be directly found from the tables.
Back to confidence
intervals…
In a question, a standard deviation or variance will always be given. You
need to determine if (from the context of the question) the value refers to
that of the sample or the whole population. I.e. if the value represents or .
Example 1:
The time (in seconds) taken to complete a simple task was recorded for
each of 15 randomly selected employees at a certain company. The values
are given below..
38.2 43.9 38.4 26.2 41.3 42.3 37.5 37.2 41.2 42.3 31 50.1 37.3 36.7 31.8
Calculate 95% and 99% confidence intervals for the mean time it takes all
the employees at this company to complete this task.
38.2 43.9 38.4 26.2 41.3 42.3 37.5 37.2 41.2 42.3 31 50.1 37.3 36.7 31.8
• Since the data obtained from the sample was given, we can use STAT
mode on the calculator to determine the sample mean () and the
sample standard deviation ().
C.I. for
For a 95% confidence interval for the mean time it takes all the employees
at this company to complete this
𝑛=15task: 𝑥=38.36 𝑆=5.78
First, we need to determine which formula will be used.
( )
𝑛− 1 ;1 −
5.78 2
¿ 38.36 ±2.145
√ 15 ¿𝑡 14 ; 0.975 ∴
𝛼
¿ 2.145 2
¿ (35.16 ; 41.56) 𝛼
∴ 1− ¿ 0.975
2
For a 99% confidence interval for the mean time it takes all the employees
at this company to complete this
𝑛=15task: 𝑥=38.36 𝑆=5.78
𝑆
𝑥±𝑡 𝛼 ∴𝑡
2 √𝑛
𝑛 −1 ; 1 − 𝛼
𝑛− 1 ;1 −
2
¿ 38.36 ±2.977
( )5.78
√ 15
¿𝑡 14 ; 0.995
¿ 2.977 𝛼
∴
¿ (33.92 ; 42.80) 2
𝛼
When is unknown but (large sample), use the C.I. ∴ 1− 2¿ 0.995
formula from Section 5.3 with a z-multiplier, but
replace by its sample estimate
𝑆 Not directly on the
𝑥± 𝑧 𝛼 formula sheet
1−
2 √𝑛