cs1 Specimen Questions and Solutions
cs1 Specimen Questions and Solutions
July 2020
Q1
A survey showed that 40% of investors invest in at least two companies in order to
diversify their risk. Let X be the random variable denoting the number of investors
who have invested in more than one company in a random sample of 300 investors.
Solution
(i) X follows a binomial (300, 0.4) distribution [0.5]
Q2
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample consisting of independent random variables
with mean µ and variance 𝜎𝜎 2 . Consider the sample mean:
∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖
𝑋𝑋� =
𝑛𝑛
In your answer you may denote µ by mu, 𝜎𝜎 2 by sigma2 and 𝑋𝑋� by Xbar.
(iii) Comment on the variance of variable 𝑋𝑋� as compared to the variance of 𝑋𝑋𝑖𝑖 . [1]
An actuary is interested in exploring the difference in the size of claim losses from
two insurance portfolios, and can take samples of claims from these portfolios.
(iv) Explain how the answer to part (iii) can affect the precision of the actuary’s
comparison. [2]
[Total 6]
2
Solution
(iii) The variance of the sample mean is smaller compared to the variance of individual variables
by a factor of 1/n. [1]
(iv) Individual values are less precise than the average of a sample. The actuary should take large
samples and compare the means to improve the precision of the comparison.
[2]
Q3
X and Y are discrete random variables with joint distribution as follows:
(i) (a) Identify which one of the following options gives the correct value of the
expectation 𝐸𝐸(𝑌𝑌 | 𝑋𝑋 = 1) : [2]
(A1) 1.3789
(A2) 2.6087
(A3) 3.1398
(A4) 4.0945
(b) Identify which one of the following options gives the correct value of the
variance var(𝑋𝑋 | 𝑌𝑌 = 3) : [2]
(A1) 1.2487
(A2) 0.9832
(A3) 1.9388
(A4) 2.2235
3
(ii) Calculate the probability functions of the marginal distributions for X and Y.
[2]
Solution
(i)(a) E(Y | X = 1)
= ∑𝑦𝑦 𝑦𝑦 𝑃𝑃(𝑌𝑌 = 𝑦𝑦 | 𝑋𝑋 = 1)
𝑃𝑃(𝑌𝑌 = 𝑦𝑦, 𝑋𝑋 = 1)
=∑𝑦𝑦 𝑦𝑦
𝑃𝑃(𝑋𝑋 = 1)
= 2.6087
Ans: (A2) [2]
[Details are for information. Candidates will not be required to show working.]
= 2.3214 – (1.0357)2
= 1.2487
Ans: (A1) [2]
[Details are for information. Candidates will not be required to show working.]
(iii) Test whether P(X = x,Y = y) = P(X = x) P(Y = y) for all pairs [1]
Show that this result does not hold for one pair, for example:
P(X = 0,Y = -1) = 0.08, not equal to P(X = 0)* P(Y = -1)
4
Q4
An actuary is asked to check a linear regression calculation performed by a trainee.
The trainee reports a least squares slope parameter estimate of 𝑏𝑏� = 13.7 and a sample
correlation coefficient 𝑟𝑟 = − 0.89.
(i) Justify why this suggests that the trainee has made an error. [2]
150
100
50
0
Residuals
(ii) Comment on the validity of the assumptions of the linear model. [2]
x: 0 1 2 3 4 5 6 7 8 9
y: −1.35 − 4.96 − 9.20 −13.15 −16.70 − 21.23 − 25.14 − 28.44 − 33.68 −37.39
for which
10 10
2
𝑦𝑦� = −19.124, �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�) = 1,329.523, �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 = 82.5
𝑖𝑖=1 𝑖𝑖=1
10
A linear model of the form y = a + bx + e is fitted to the data, where the error terms
(e) independently follow a 𝑁𝑁(0, σ2 ) distribution, and where 𝑎𝑎, 𝑏𝑏 and σ2 are unknown
parameters.
5
(iii) Determine the fitted line of the regression model. [3]
(iv) (a) Identify which one of the following options gives the correct estimate of
the variance σ2 of the model. [1]
(A1) 0.612
(A2) 1.098
(A3) 0.971
(A4) 0.139
(b) Identify which one of the following options gives the correct estimate of
the variance of the predicted mean response if x = 11. [2]
(A1) 0.161
(A2) 0.085
(A3) 0.287
(A4) 0.309
(c) Calculate a 95% confidence interval for the predicted mean response if
x = 11. [2]
(v) Comment on the width of a 95% confidence interval for the predicted mean
response if x = 3.5, as compared to the width of the interval in part (iv),
without calculating the new interval. [2]
[Total 14]
Solution
(i) The regression slope suggests a positive relationship between the two variables, while the
correlation coefficient shows a strong negative relationship. [2]
(ii) The histogram suggests a non-symmetric distribution for the residuals and therefore the
assumption that the errors follow a N(0,sigma^2) distribution does not seem valid.
[2]
6
(iv) (a)
𝑆𝑆𝑥𝑥𝑥𝑥 2 (−331.05)2
�𝑆𝑆𝑦𝑦𝑦𝑦 − � �1329.523 − �
𝑆𝑆𝑥𝑥𝑥𝑥 82.5
𝜎𝜎� 2 = = = 0.139
𝑛𝑛 − 2 8
[Details are for information. Candidates will not be required to show working.]
(b)
45
1 (𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥−𝑥𝑥̅ )2 1 (11− )2
2
𝑉𝑉(𝑦𝑦�) = � + � × 𝜎𝜎� = � + 10
� × 0.139 = 0.085
𝑛𝑛 𝑆𝑆𝑥𝑥𝑥𝑥 10 82.5
[Details are for information. Candidates will not be required to show working.]
(c)
Predicted value is: yhat = -1.066 – 4.013 * 11 = - 45.209 [0.5]
(v) The width of the interval is only affected by V(yhat), which depends on the new x value
through the term (x_new – xbar)^2. This term will now be smaller as the new x_new = 3.5
value is closer to xbar than x = 11. Therefore the interval will be narrower.
[2]