2016 2017 Exam
2016 2017 Exam
Two hours
INTRODUCTION TO STATISTICS
30 May 2017
14.00 – 16.00
Answer ALL FOUR questions in Section A (10 marks each) and TWO of the THREE questions
in Section B (20 marks each). If more than TWO questions from Section B are attempted, then
credit will be given for the best TWO answers.
Electronic calculators may be used, provided that they cannot store text.
1 of 8 P.T.O.
MATH10282
SECTION A
A1. Suppose that we have a sample of observations x1 , . . . , xn obtained by random sampling from a
continuous distribution with probability density function f (x).
We wish to calculate a density histogram. Suppose that x1 , . . . , xn ∈ (a1 , aK+1 ] and that the range
(a1 , aK+1 ] is divided evenly into K subintervals, or bins, Bk = (ak , ak+1 ], k = 1, . . . , K, of width h.
(i) Write down an expression for the function Hist(x) defining the density histogram based on bins
Bk . Explain any notation you use.
[2 marks]
(iii) Using your answer to part (ii), or otherwise, estimate the mean of the distribution that gener-
ated these data.
[3 marks]
[Total 10 marks]
2 of 8 P.T.O.
MATH10282
1
Pn
A2. Suppose that X1 , . . . , Xn ∼ N (µ, σ 2 ) independently, and let S 2 = n−1 i=1 (Xi − X̄)2 denote
the sample variance.
[5 marks]
(ii) State the distribution of (n − 1)S 2 /σ 2 , and give formulae for the end-points of a 100(1 − α)%
confidence interval for σ 2 .
[3 marks]
P10 P10
(iii) A data set is obtained with n = 10, i=1 xi = 859, and i=1 x2i = 74227. Calculate a 95%
confidence interval for σ 2 .
[2 marks]
[Total 10 marks]
3 of 8 P.T.O.
MATH10282
(ii) If n is large, what is the approximate sampling distribution of X̄? State any results you use.
[2 marks]
(iii) Suppose that a sample of size n = 75 is obtained, for which ni=1 xi = 1147. Use your answer
P
to part (ii) to estimate the approximate probability that, for a future sample of size m = 80
from the same distribution, the value of X̄m is greater than 15.25.
[4 marks]
[Total 10 marks]
4 of 8 P.T.O.
MATH10282
(i) Write down an appropriate unbiased estimator, σ̂ 2 , of the common variance σ 2 . Write down a
suitably scaled version of σ̂ 2 and state its sampling distribution.
[3 marks]
(ii) Suggest a suitable test statistic for testing H0 vs H1 , and state its exact distribution in the
case that H0 is true. Give an appropriate rejection region to achieve significance level α.
[3 marks]
Do you reject H0 if θ0 = 1 and α = 0.05? How about if α = 0.01? Show your working.
[4 marks]
[Total 10 marks]
5 of 8 P.T.O.
MATH10282
SECTION B
(i) Define the following terms: Type I error, Type II error, and the significance level of the test
(also known as the size).
[3 marks]
(ii) Write down an appropriate test statistic and rejection region, making sure to define any notation
used. Show that this choice does indeed achieve a significance level of α.
[4 marks]
(iii) Suppose for this part that µ0 = 10, σ 2 = 1 and a data set is obtained with n = 10 and ni=1 xi =
P
106. Do you reject H0 at significance level α = 0.05? What if α = 0.01?
[3 marks]
Pn
(iv) Suppose for this part that a data set is obtained with i=1 xi = nµ0 + 0.2nσ. What is the
minimum value of n for which H0 is rejected at (i) a 5% significance level, and (ii) a 1%
significance level?
[3 marks]
(v) Show that for µ > µ0 the probability of correctly rejecting H0 is equal to
µ0 − µ
1 − Φ zα + √ .
σ/ n
[7 marks]
[Total 20 marks]
6 of 8 P.T.O.
MATH10282
B6. A random variable X has a negative binomial distribution NegBi(r, p), with p ∈ (0, 1) and r a
positive integer, if it has probability mass function (p.m.f.)
x+r−1
pX (x) = P(X = x) = (1 − p)r px , x = 0, 1, 2, . . . ,
x
m m!
where above the notation k
= k!(m−k)!
denotes a binomial coefficient.
(i) Show that if X1 , . . . , Xn ∼ NegBi(r, p) independently, with r known and p unknown, then the
likelihood function is
" n #
Y Xi + r − 1 Pn
L(p) = (1 − p)nr p i=1 Xi .
i=1
Xi
[2 marks]
X̄
p̂ = .
r + X̄
[7 marks]
(iv) Use the result given in part (iii) to calculate the approximate probability that 0.45 < p̂ < 0.55
when p = 0.5, r = 3 and n = 100.
[Hint: you may use without proof that E(X) = pr/(1 − p) and Var(X) = pr/(1 − p)2 .]
[5 marks]
[Total 20 marks]
7 of 8 P.T.O.
MATH10282
B7. The population proportion of batteries of Type 1 that are defective is p1 , and the population
proportion of batteries of Type 2 that are defective is p2 . Both p1 and p2 are unknown, and it is
desired to estimate the difference θ = p1 − p2 .
To investigate this, an experimenter will examine n batteries of Type 1 and m batteries of Type 2,
and record how many of each type are defective.
Let X1 denote the number of defectives in the sample of Type 1 batteries, and let X2 denote the
number of defectives in the sample of Type 2 batteries.
(a) (i) Suggest an appropriate estimator, θ̂, for θ and define any notation used.
[2 marks]
(ii) Show that θ̂ is unbiased, and derive an expression for v(n, m) = Var θ̂.
[6 marks]
(b) The investigator considers two possible experimental designs with total sample size n+m = 300:
(i) Show that Var θ̂ is smaller under Design 1 than under Design 2 if and only if
p1 (1 − p1 ) 1
≥ .
p2 (1 − p2 ) 2
[4 marks]
(ii) Hence argue that, when p1 = 0.1, the estimator θ̂ performs better if we use Design 1 rather
than Design 2 provided that p2 < 0.235 or p2 > 0.765.
[4 marks]
(c) Now suppose that the experimenter finds that 13 out of n = 150 batteries of Type 1 are
defective and 17 out of m = 150 batteries of Type 2 are defective.
[Total 20 marks]