MZB127 Topic 10 Lecture Notes (Unannotated Version)
MZB127 Topic 10 Lecture Notes (Unannotated Version)
Preface
Often, we are interested to draw conclusions about one or two features of a distribution, such as
the mean (µ) of a continuous variable or the probability (p) of a specified outcome for a discrete
variable. These parameters can be estimated using the corresponding sample estimates (X̄ or p̂)
but these estimates won’t be exactly correct, so a natural question we raised in Section 9.2.1
was:
How close might our sample statistic (data-based estimate) be to the “true” value
of the relevant distribution parameter?
We will finalise our discussion of this question in the present chapter by formalising the concept
of confidence intervals (Section 10.1).
A second question we raised in Section 9.2.1 was:
In this chapter we will also address this second question via the concept of hypothesis testing
(Section 10.2).
For both questions, we must use the distribution of the estimates for the parameters we
are interested in, that is, the sampling distributions for either X or p̂. However, unlike the
calculations we considered in Sections 9.2.2 and 9.2.3, we may have no information available to
us about the true parameters µ, σ and/or p, so our approaches must change accordingly.
77
78 Chapter 10. Confidence Intervals and Hypothesis Testing
10.1.1 Estimating the true mean µ from the sample mean X, sample
standard deviation s and sample size n
Consider a Normal variable X ∼ N(µ, σ 2 ), but now we don’t know what µ or σ is.
Recall that, if the Central Limit Theorem holds or if X is already Normal, the mean X of an
n-sized sample of X follows a Normal distribution X ∼ N (µ, σ 2 /n), and thus any obtained
sample mean x can be converted to a sample from the standard Normal distribution Z ∼ N(0, 1)
via
x − µX x−µ
z= = √ .
σX σ/ n
If we have a sample mean x from a sample of size n, we should also be able to calculate a sample
standard deviation s. If we don’t know what the true standard deviation σ is, it can be shown
(not here!) that we can actually replace σ by s in the above equation if we use the Student’s
t-distribution with n − 1 degrees of freedom instead of the standard Normal distribution:
x−µ
t= √ , d = n − 1.
s/ n
The above equation is a key application of the Student’s t-distribution (that we introduced
last week); this equation will also allow us to calculate a confidence interval for the true mean
µ, as we will see shortly. A confidence interval is a range of plausible values for the quantity
that we are trying to estimate. When talking about confidence intervals, we define a confidence
level (e.g. 90%, 95%, 99%) as the estimated probability that the true value of the quantity falls
within the range of plausible values that we specify. For example, we may wish to use our data
(x, s and n) to estimate, with 95% confidence, a range of plausible values for µ.
We usually use (1 − α) to denote the confidence level. For example, if α = 0.01, we are interested
in a 99% confidence interval; if α = 0.05, we are interested in a 95% confidence interval; etc. If
we use a to denote the maximum difference between x and µ that defines the lower and upper
bounds of our confidence interval, we can therefore write:
Thus, if we know a, then we know that there is a (1 − α)% probability that the difference
between the sample mean and true mean, x − µ, is between −a and a. Equivalently, we can say
that the (1 − α)% confidence interval for µ is between x − a and x + a, or more simply, this
confidence interval is x ± a. We must now try to work out what a is!
To do this, we assume that it is equally likely (or unlikely!) that x − µ could exceed a versus
the possibility that x − µ could be less than −a. This assumption, combined with the above
equation, gives us that:
α
Pr(x − µ > a) = Pr(x − µ < −a) = .
2
If we then focus on the first probability expression√in the equation above, and divide the
inequality inside the brackets of Pr(x − µ > a) by s/ n, we obtain:
10.1. Confidence Intervals 79
x−µ a α
Pr √ > √ = .
s/ n s/ n 2
We note that the term on the left-side of the inequality is described by a t-distribution with
degrees of freedom d = n − 1. Noting this, and with a slight abuse of notation, we end up with:
a α
Pr T > √ = , d = n − 1.
s/ n 2
If we now denote td,p as the value of the t-distribution for d degrees of freedom that satisfies
Pr(T > td,p ) = p, the above equation gives us that:
a
tn−1,α/2 = √ , or equivalently,
s/ n
s
a = tn−1,α/2 √ .
n
To obtain tn−1,α/2 , we can use Fawcett and Kent Table 6, or in Excel we can use the command
=T.INV(1-α/2,n-1) (see Section 9.1.2 for further details). The next few examples demonstrate
how to apply the equations above to calculate confidence intervals for the true mean µ, from a
sample mean x and sample standard deviation s obtained from a sample of size n.
Examples
10.1.1.1 Live example: output voltage of a computer power supply revisited
In Example 9.2.2.1, we considered the output voltage of a computer power supply described
by the variable X with X ∼ N (6.5, 0.022 ), that is, µ = 6.5 and σ = 0.02.
Now, suppose that the true mean µ and true standard deviation σ are both unknown, but
a random sample of 25 observations gave a sample mean of x = 6.505 V and a sample
standard deviation s = 0.022 V. Find the value a such that Pr(−a < x̄ − µ < a) = 0.95.
Can you also explain what this value of a tells us?
80 Chapter 10. Confidence Intervals and Hypothesis Testing
A sample of MZB127 students are randomly selected and have their heights measured.
The measurements that were obtained are tabulated below.
Height (cm) 171 184 177 178 165 190 179 183
Compute a 95% confidence interval for the average height of an MZB127 student.
10.1. Confidence Intervals 81
(a) Use this information to provide a 99% confidence interval estimate for the true mean
µ of rainwater pH values in this area.
(b) Provide a 98% confidence interval estimate for the true mean µ of rainwater pH
values in this area.
Denoting the pH value of the rainwater samples as x, we have x̄ = 5.3, s = 0.64 and
n = 20. For a confidence level of 1 − α = 0.98, we have α/2 = 0.01, so we look up
t19,0.01 on Fawcett and Kent Table 6 and find that t19,0.01 = 2.539. The desired 98%
confidence interval for µ is then given by
s 0.64
x̄ ± a = x̄ ± tn−1,α/2 √ = 5.3 ± 2.539 × √ ≈ 5.3 ± 0.36 = 4.94 to 5.66.
n 20
The first question we raised in Section 9.2.1 and reiterated at the start of the chapter, “How
close might our sample statistic (data-based estimate) be to the “true” value of the relevant
distribution parameter?”, has now been answered for the case where the sample statistic (data-
based estimate) is the sample mean X! We next tackle the case where our sample statistic
(data-based estimate) is the sample proportion p̂.
82 Chapter 10. Confidence Intervals and Hypothesis Testing
p̂ − µp̂ p̂ − p
z= =p .
σp̂ p(1 − p)/n
Note that the conversion above is straightforward if we know the true proportion p, but rather
complicated if we don’t know p! The trick we employ here is to approximate, on the denominator,
p by p̂,
p̂ − p
z≈p ,
p̂(1 − p̂)/n
as this makes latter computations more straightforward. (Note that this means we have now
introduced two approximations: (1) approximating the binomial distribution by a Normal
distribution, and (2) approximating p by p̂. Alas, this is sometimes required in mathematics or
statistics to obtain an answer without getting stuck!)
If we only have a sample estimate, i.e. we only know values p̂ and n, a confidence interval for
the true proportion p can be written as
where, like in Section 10.1.1, (1 − α) is the confidence level, and a is the value that denotes the
maximum difference between p̂ and p to define the lower and upper bounds of our confidence
interval. We now must find a suitable value for a.
Similarly to Section 10.1.1, we assume it is equally likely (or unlikely!) that p̂ − p could exceed
a versus the possibility that p̂ − p could be less than −a. This assumption, combined with the
above equation, gives us that:
α
Pr(p̂ − p > a) = Pr(p̂ − p < −a) = .
2
We note that the term on the left-side of the inequality is described by a standard Normal
distribution. Noting this, and with a slight abuse of notation, we end up with:
!
a α
Pr Z > p = .
p̂(1 − p̂)/n 2
10.1. Confidence Intervals 83
If we now denote zp as the value of the standard Normal distribution that satisfies Pr(Z > zp ) = p,
the above equation gives us that:
a
zα/2 = p , or equivalently,
p̂(1 − p̂)/n
p
a = zα/2 p̂(1 − p̂)/n.
To obtain zα/2 , we can use Fawcett and Kent Table 4, or in Excel we can use the command
=NORM.S.INV(1-α/2) (see Section 9.1.1 for further details). The next few examples demonstrate
how to apply the equations above to calculate confidence intervals for the true proportion p,
from a sample proportion p̂ obtained from a sample of size n.
Examples
10.1.2.1 Live example: fair 12-sided die revisited
In Example 9.2.3.1, we considered a game involving a 12-sided die with faces numbered
from 1 to 12.
Suppose that we no longer assume the die is fair but instead are trying to estimate the
true proportion p of 10s for this die using a sample of 150 throws and that our estimate
from this sample is p̂ = 0.08. Find a value a such that Pr(−a < p̂ − p < a) ≈ 0.9. Can
you also explain what this value of a tells us?
84 Chapter 10. Confidence Intervals and Hypothesis Testing
Consider that 945 people were observed on a bike track, including 625 cyclists.
(a) Estimate the true proportion p of cyclists among the users of this bike track.
(b) Determine the value a such that we can have 90% confidence that a sample proportion
from a sample of this size will lie within a of the true proportion p.
10.1. Confidence Intervals 85
(c) What is the 95% confidence interval for the true proportion p?
We have X = 625, n = 945 and hence p̂ = 625/945 ≈ 0.6614. The confidence
interval 1 − α = 0.95, so zα/2 = z0.025 = 1.96 (from Fawcett and Kent Table 4). We
can then calculate a as
p p
a = zα/2 p̂(1 − p̂)/n = 1.96 0.6614 × 0.3386/945 ≈ 0.03017.
Thus the 95% confidence interval for the true proportion of cyclists is (approximately)
p = 0.66 ± 0.03, or between 0.63 and 0.69.
(a) Use this information to provide a 95% confidence interval for the true proportion p
of defective batteries for this plant.
86 Chapter 10. Confidence Intervals and Hypothesis Testing
(b) Estimate the true proportion of defective batteries under these production conditions
using a 90% confidence interval.
In this case, we have X = 9, n = 235 and so p̂ = 9/235 ≈ 0.0383. For a confidence
level of 1 − α = 0.90, we have α/2 = 0.05, so we look up z0.05 on Fawcett and Kent
Table 4 and find that z0.05 = 1.64485. The desired 90% confidence interval for p is
then given by
r
p 0.0383 × 0.9617
p̂ ± a = p̂ ± zα/2 p̂(1 − p̂)/n = 0.0383 ± 1.64485
235
≈ 0.0383 ± 0.0206 = 0.0177 to 0.0589.
The first question we raised in Section 9.2.1 and reiterated at the start of the chapter, “How
close might our sample statistic (data-based estimate) be to the “true” value of the relevant
distribution parameter?”, has now been answered for the case where the sample statistic
(data-based estimate) is the sample proportion p̂.
The essential concept in all of this is that we compare the assumption against the sample
evidence by calculating a probability of the form
Pr(observing the sample we observed assuming that the hypothesis (assumption) is true).
The smaller this probability, the less we are inclined to accept the assumption as a reasonable
description (model) for the observations in our sample. So a small probability provides evidence
against the assumption, whereas a “not small” probability doesn’t. It is also worth noting that
10.2. Hypothesis Tests 87
a statistical test cannot prove that an assumption is false (nor that it is true); it can only tell
you how strong (or weak) the evidence of the sample is for rejecting the assumption.
The assumption that is being tested is usually known as the null hypothesis, denoted as H0 for
short; this is the assumption that will be accepted (for the moment) unless the evidence from
the sample is strong enough to reject it. In a formal statistical test, this assumption should be
stated at the start of the test. The probability that is calculated (based on the assumption and
the sample) is usually known as the p-value (short for probability). Clearly, the assumption that
is to be tested will vary from one situation to another. The type of assumption to be tested will
also vary between situations, and this will change the way we calculate the p-value from the
data and the assumption. However, the process for drawing the conclusion (using the p-value to
assess the strength of evidence against the assumption) is the same for all formal statistical
tests.
In some fields of study, fixed values (typically 5% or 1%) are routinely used as “cut-offs” for
deciding whether to reject an assumption and the result is just reported as “H0 rejected” or
“H0 accepted” (interpreted into the relevant context). This is not an ideal practice, as it makes
an artificial distinction between similar p-values such as 0.049 and 0.051 and gives no indication
of how strong (or weak) the evidence against H0 actually is. The best approach, as indicated
above, is to quote the actual p-value and to interpret the corresponding strength of evidence
against H0 , stating the conclusion to be drawn from this in the original context for the test.
This allows us to make probability statements about the sample mean x̄, assuming that µ is
known. If we have an assumption (H0 ) about µ, we can therefore use an appropriately large
sample of data together with the formula above to obtain evidence regarding the assumption;
that is, to give us a p-value for testing the assumption.
Note: These tests rely on the validity of the sampling distribution for X̄ described by the
equation above, which requires in turn that the variable is Normally distributed or that n ≥ 30.
This should be checked (at least roughly) before proceeding with such a test.
It is also useful to specify the assumption that will be accepted if H0 is rejected. This is known
as the alternative hypothesis and is usually denoted as either H1 (or, sometimes, HA ). The
alternative hypothesis H1 tells us how to find the p-value. For example:
In testing whether there is bias in a measured quantity (with true value M ), we test
H0 : µ = M vs H1 : µ ̸= M .
In testing whether process changes have reduced the proportion of defectives below the
previous rate d, we test H0 : p = d against H1 : p < d.
When H1 includes < or >, it’s called a one-sided alternative and the test is often known as a
one-sided test. Likewise, when H1 includes ̸=, we have a two-sided alternative and a two-sided
test. The choice of alternative hypothesis only affects how we find the p-value from the table, as
follows:
For a one-sided test, our p-value is either Pr(T < t) or Pr(T > t), so the p-value is simply
equal to the relevant one out of these two probabilities.
For a two-sided test, our p-value will be Pr(T < |−t|) + Pr(T > |t|) = 2 × Pr(T > |t|), so
the p-value is double the value of Pr(T > |t|).
10.2. Hypothesis Tests 89
Examples
10.2.2.1 Live Example: MZB127 students
A sample of MZB127 students are randomly selected and have their heights measured.
The measurements that were obtained are tabulated below.
Height (cm) 171 184 177 178 165 190 179 183
(a) Is there evidence to reject the hypothesis/assumption that the average height of an
MZB127 student is 184 cm?
90 Chapter 10. Confidence Intervals and Hypothesis Testing
(b) Is there evidence to suggest that the average height is less than 184 cm?
We can now look up the value 2.80 on the t-distribution with n − 1 = 19 degrees of
freedom and we find that Pr(T ≥ 2.80) ≈ 0.00571. (The table we use is Table 5 in the
10.2. Hypothesis Tests 91
tables that have posted on Canvas. As this is a one-sided test, Pr(T > 2.80) = 0.00571 is
the p-value for the test.
This is a very small probability, so there is very strong evidence against H0 and we have
very strong evidence to conclude that the true mean pH of rainfalls in this area is more
acidic than the standard.
From Section 10.1.2, we noted that values of p̂ from its approximate Normal distribution can be
converted to values z from the standard Normal distribution via:
p̂ − p
z=p .
p(1 − p)/n
We can then use the corresponding Normal probabilities associated with this z-value to carry
out a hypothesis test on a proposed proportion p.
Note that H0 gives us a value of p for this calculation, so there is no need for any further
approximation. Once again, H1 can be either one-sided or two-sided, with
For a one-sided test, the p-value equals Pr(Z < z) or Pr(Z > z), so the p-value is equal to
the relevant one out of these two probabilities.
For a two-sided test, the p-value equal to Pr(Z < |−z|) + Pr(Z > |z|) = 2 × Pr(Z > |z|),
so the p-value is double the value of Pr(Z > |z|).
Examples
10.2.3.1 Live Example: Bike track users revisited
An investigation into the usage of a bike track observed 945 people using the track, of
whom 625 were cyclists. Historical data indicates that the proportion of users on this
track who are cyclists is 65%.
(a) Is there any evidence from this sample to suggest that this proportion has changed
since then?
10.2. Hypothesis Tests 93
(b) Is there any evidence from this sample to suggest that this proportion has increased?
94 Chapter 10. Confidence Intervals and Hypothesis Testing
In Example 10.1.2.3, electrical cells (batteries) from a battery plant were tested and we
considered the proportion of defective cells being produced under the current operating
conditions. Suppose that the company had previously quoted the proportion of defective
cells at 6%. Having put procedures in place to improve their manufacturing process, they
want to know whether the data from the latest sample (9 defective cells in a sample of
235) provide evidence of a decrease in the proportion of defective cells.
If we denote the proportion of defective cells as p, we are testing H0 : p = 0.06. Since we are
interested in whether there is evidence of a decrease in p, our alternative is H1 : p < 0.06.
Our sample gives a sample proportion of p̂ = 9/235 ≈ 0.0383. The probability of observing
a sample at least this unusual (if H0 is true) can be calculated approximately as:
!
0.0383 − 0.06
Pr(p̂ ≤ 0.0383) = Pr Z ≤ p
0.06 × 0.94/235
= Pr(Z ≤ −1.42)
= Pr(Z > 1.42)
≈ 0.0778 [ from Table 3 of the probability tables]
This is the p-value for our test, so there is only very slight evidence against H0 ; that is,
there is only slight evidence that the true proportion of defectives has decreased.
A fair coin is assumed to have an equal chance of producing either a head or a tail.
Suppose we throw a fair coin 200 times and observe 110 heads (and hence 90 tails). Does
this sample provide evidence that the coin is biased?
Let p be the proportion of heads from tossing this coin. Since we would be equally
̸ 0.5 in
concerned by a change in either direction, we are testing H0 : p = 0.5 vs H1 : p =
this case.
Our sample gives a sample proportion of p̂ = 110/200 = 0.55. The test statistic for this
test is
p̂ − p 0.55 − 0.5
z=p =p ≈ 1.41.
p(1 − p)/n 0.5 × 0.5/200
Now, since our alternative hypothesis here is that p ̸= 0.5, we are actually looking to
see whether p̂ is too different from 0.5 on either side, in which case 90 heads (or fewer)
counts as being just as unusual as 110 heads (or more). It follows therefore that our
p-value in this situation (namely, Pr(observing a sample at least as unusual as this one
assuming H0 is true)) is not just Pr(Z > 1.41) but rather Pr(Z < −1.41)+Pr(Z > 1.41) =
2 × Pr(Z > 1.41).
From Table 3, Pr(Z > 1.41) ≈ 0.07927, so our p-value is therefore ≈ 2 × 0.07927 = 0.15854.
This is not a low p-value, so there is not enough evidence to reject H0 and we conclude
that we do not have evidence (from this sample) that the coin is biased.
10.2. Hypothesis Tests 95
4. Look up the calculated value of the test statistic on the relevant table (t-distribution with
d = n − 1 degrees of freedom, or standard Normal distribution) to find the p-value (paying
attention also to H1 ).
5. State the strength of evidence (if any) against H0 , based on the p-value.