CS1 Assignment Book
CS1 Assignment Book
CS1
ACTUARIAL STATISTICS
ASSIGNMENTS & SOLUTIONS
FOR 2023
INDEX
CHAPTERS PAGE NO
BASICS ASSIGNMENT 3-13
ASSIGNMENT 1 14-25
ASSIGNMENT 2 26-31
ASSIGNMENT 3 32-55
ASSIGNMENT 4 56-82
ASSIGNMENT 5 83-96
SOLUTIONS
BASICS ASSIGNMENT 97-100
ASSIGNMENT 1 101-103
ASSIGNMENT 2 104-105
ASSIGNMENT 3 106-115
ASSIGNMENT 4 116-131
ASSIGNMENT 5 132-137
BASICS ASSIGNMENT
CHAPTER
SUMMARISING DATA
QUESTION 1.
QUESTION 2.
The frequency table shows the number of claims made on 100 car insurance policies in the last year. Calcu-
late the mean number of claims per policy:
Frequency 74 19 5 2
QUESTION 3.
(i) The mean age of death of 12 assurance policyholders was 72. What was the total age of the 12 policy-
holders?
(ii) The mean of the following list of investment returns is 4.2%, 5% , 4.75%, 3.6%, x%, 3.25%
(iii) A small department employs ten actuaries; their mean salary is £48,000. When an eleventh actuary
joins the department the mean salary of all the actuaries drops to £45,800. Find the salary of the new
employee.
(iv) The mean sum assured on 12 term assurances was £50,000 whereas the mean sum assured on 8 en-
dowment assurances was £30,000. Calculate the mean sum assured on all 20 policies.
QUESTION 4.
A list of the age last birthday at death of 30 male policyholders who held life assurance policies with a par-
ticular company is given:
57 68 75 66 72 86 80 81 70 78 76 72 88 84 69
77 83 90 48 63 74 81 94 51 73 96 81 66 77 101
Find the median age last birthday at death of these policyholders.
QUESTION 5.
The frequency table shows the number of claims (of a particular type) made each week in the last year.
Frequency 5 7 15 12 9 4
QUESTION 6.
Here are the salaries of 7 individuals in a company (in £000’s):
18 21 25 25 25 25 30
Find the second order sample moment of this data set.
QUESTION 7.
Here are the salaries of 7 individuals in a company (in £000’s):
18 21 24 25 25 25 30
Find the third order central sample moment of this data set.
QUESTION 8.
02334445
QUESTION 9.
What would the position of the mean, mode and median be for a negatively skew distribution?
QUESTION 10.
Given that:
xi x xi x
2 3
N 100 856,934.91 11,949,848.3946
Calculate the:
QUESTION 11.
This frequency table shows the number of claims per policy made to a car insurance company in the last
year. Calculate the mean and the standard deviation of the number of claims per policy:
Frequency 74 19 5 2
QUESTION 12.
(i) A group of 12 actuaries are weighed. Their weights have a mean of 78 kg and a standard deviation of 4
kg. Find the sum of squares (ie∑x2) of their weights.
(ii) The temperature over the previous 6 days had a mean of 19 C and a standard deviation of 5 C . To-
day’s temperature was 16 C . Calculate the mean and standard deviation of the temperature over all 7
days.
(iii) The ages at which a group of 10 male policyholders died had a mean of 72 years and a standard devia-
tion of 7 years. The ages at which a group of 8 female policyholders died had a mean of 78 years and a
standard deviation of 9 years.
QUESTION 13.
38024
CHAPTER
BASIC PROBABILITY
QUESTION 14.
A pile of 15 scripts contains two CT3’s, three CT4’s, four CA1’s and six ST5’s. A marker picks a script from
the pile at random.
QUESTION 15.
In a CT3 tutorial there are 11 students of which 6 are female. Three of the women and2 of men are also tak-
ing CT4. What is the probability that a student picked at random:
QUESTION 16.
The probability that a car claim to a certain company is in excess of £1,000 is 0.6. What is the probability
that a claim is not in excess of £1,000?
QUESTION 17.
In a portfolio of 50 car insurance policyholders, 6 have “4 years no claims bonus”, 15 have “3 years no
claims bonus”, 18 have “2 years no claims bonus”, 7 have just “one year no claims bonus” and the rest have
none. A policyholder is picked at random, what is the probability that they have:
QUESTION 18.
On a Friday night the probability that a driver has been drinking is 0.2. If a driver has been drinking the
probability that they have an accident is 0.05; otherwise it is 0.0001.
(i) Calculate the probability that a driver chosen at random on a Friday night has an accident.
(ii) At an accident on a Friday night the police carry out a breath test. What is the probability that the driver
has been drinking?
QUESTION 19.
A blood test for a particular type of cancer is 95% accurate for a patient with the cancer and 98% accurate
for a healthy patient. If only 6% of those actually tested have the cancer, calculate the probability that:
(iii) a patient who gets a positive result actually has the cancer.
QUESTION 20.
The probability that a car accident is due to faulty brakes is 0.02, the probability that a car accident is cor-
rectly attributed to faulty brakes is 0.95, and the probability that a car accident is incorrectly attributed to
faulty brakes is 0.01.
Calculate the probability that a car accident, which is attributed to faulty brakes, was due to faulty brakes.
QUESTION 21.
An insurance company insured 6000 scooter-drivers, 3000 car-drivers and 9000 truck-drivers. The proba-
bility of an accident involving a scooter, a car and a truck is 0.02, 0.06 and 0.30 respectively. One of the in-
sured persons meets with an accident. Find the probability that he is a car-driver.
QUESTION 22.
One card from a pack of 52 cards is lost. Two cards are drawn from this pack and found to be both di-
amonds. Find the probability that the lost card is a diamond.
CHAPTER
RANDOM VARIABLE
QUESTION 23.
Write down the probability function for the number of heads obtained when flipping two coins.
QUESTION 24.
P X x cx
2
x = 1, 2, 3 or 4
QUESTION 25.
W 2 4 5
QUESTION 26.
0 v 1
0.216 1 v 2
FV v 0.648 2v3
0.936 3 v 4
1 4v
Find:
QUESTION 27.
X 4 6 7 10
QUESTION 28.
W 2 4 5
Calculate:
(i)
E W
2
(ii) E(5W-2) (iii) E(1/W)
QUESTION 29.
Value of x 0 1 2 3 4 5 6 7
P(x) 0 k 2k 2k 3k k
2
2k
2 2
7k k
QUESTION 30.
= 0, elsewhere
QUESTION 31.
= 0, elsewhere
CHAPTER
PROBABILITY DISTRIBUTION
QUESTION 32.
Eight coins are tossed. Find the probability of getting (i) two heads (ii) no head and (iii) at least two heads.
QUESTION 33.
For a binomial distribution, the mean is 3 and variance is 2. Find the values of n and p. Hence find the prob-
ability that X is 5.
QUESTION 34.
A local electrical appliances shop has found from experience that the demand for tube lights is distributed
as Poisson with mean of 4 tube lights per week. If the shop keeps 6 tubes during a particular week, what is
the probability that the demand will exceed supply during that week?
QUESTION 35.
If 5% of the electric bulbs manufactured by a company are defective, use Poisson distribution to find the
probability that in a sample of 100 bulbs (i) none is defective, (ii) 5 bulbs will be defective.
QUESTION 36.
A sample of 100 dry battery cells tested to find the length of life, have mean 12 hours and standard devia-
tion 3 hours. Assuming the data are normally distributed, what percentage of battery cells are expected to
have life (i) more than 15 hours, (ii) less than 6 hours and (iii) between 10 and 14 hours?
QUESTION 37.
A company has a portfolio of 50 high-risk car insurance policies. The number of claims per policy in a 3-
month period has a Poisson distribution with mean 0.5. It is assumed that all of the policies in the portfolio
are independent.
QUESTION 38.
Claim amounts for a particular type of medical negligence are lognormally distributed with mean £15,000
and standard deviation £8,000. Calculate the probability that the next claim exceeds £20,000
QUESTION 39.
2
If the random variable x has a distribution with five degrees of freedom, calculate:
QUESTION 40.
A small voting district has 101 female voters and 95 male voters. A random sample of 10 voters is drawn.
What is the probability exactly 7 of the voters will be female?
QUESTION 41.
An oil company conducts a geological study that indicates that an exploratory oil well should have a 20%
chance of striking oil. What is the probability that the first strike comes on the third well drilled?
QUESTION 42.
Calculate P(X < 8) if:
(i) X is the number of claims reported in a year by 20 policyholders. Claims reporting from each policy-
holder makes claims at the rate of 0.2 per year independently of the other policyholders.
(ii) X is the number of claims examined up to and including the fourth claim that exceeds £20,000. The
probability that any claim received exceeds £20,000 is 0.3 independently of any other claim.
(iii) X is the number of deaths amongst a group of 500 policyholders. Each policyholder has a 0.01 proba-
bility of dying in the coming year independently of any other policyholder.
(iv) X is the number of phone calls made before an agent makes the first sale. The probability that any
phone call leads to a sale is 0.01 independently of any other call.
ASSIGNMENT 1
CHAPTER
GENERATING FUNCTIONS
QUESTION 1.
QUESTION 2.
Derive the MGF of the random variable X with probability density function
f x 1 2 1 x 1 x 1
QUESTION 3.
Calculate the mean and variance of a random variable, X, with MGF given by:
1
t
MX t 1 t 5
5
QUESTION 4.
2
State the CGF of X where X ~ Gamma , . Hence prove that E X , var X 2 and skew X 3 .
QUESTION 5.
If X follows the gamma distribution with parameters 2 and 0.4 , calculate P X 10 using direct inte-
gration.
QUESTION 6.
(i) Determine the moment generating function of the two-parameter exponential random variable X, de-
fined by the probability density function:
x
f x e , x where , 0
(ii) Hence, or otherwise, determine the mean and variance of the random variable X.
QUESTION 7.
(i) Derive an expression for the moment generating function of 2X+ 3 in terms of Mx t .
2
Now suppose that X is normally distributed with mean and variance .
QUESTION 8.
2
M Y t 1 4t t 0.25
Calculate:
(i) E(Y)
.
(iii) E Y
6
QUESTION 9.
P U u pq
u 1
u 1,2,3,... where p q 1
(ii) Write down the CGF of U, and hence show that E(U)= 1/p .
QUESTION 10.
f x ke
2x
x R
(b) State the values of t for which the formula in part (i)(a) is valid.
QUESTION 11.
(i) Derive, from first principles, the moment generating function of a Gamma , random variable.
(ii) Show, using the moment generating function, that the mean and variance of a Gamma , random
2
variable are / and / , respectively.
QUESTION 12.
The claim amount X in units of £1,000 for a certain type of industrial policy is modelled as a gamma varia-
1 2
(i) Use moment generating functions to show that X ~ 6 .
2
(ii) Calculate the probability that a randomly chosen claim amount exceeds £20,000.
CHAPTER
JOINT DISTRIBUTION
QUESTION 13.
1 2 3 4
2 4 6 8
1
35 35 35 35
1 2 3 4
N 2
35 35 35 35
1 1 3 2
3
70 35 70 35
(i) P(M= 3, N = 1 or 2)
(ii) P(N = 3)
QUESTION 14.
The continuous random variables U and V have joint probability density function:
2u v
fU,V u, v , where 10 u 20 and 5 v 5
3,000
QUESTION 15.
Determine the marginal probability density functions for U and V, where:
2u v
fU,V u, v , for 10 u 20 and 5 v 5
3,000
QUESTION 16.
Let X and Y have joint density function:
1
f x, y x 3y 0 x 2,0 y 2
16
QUESTION 17.
N1
Calculate the expected value of , where the joint distribution of M and N is:
M
1 2 3 4
2 4 6 8
1
35 35 35 35
1 2 3 4
N 2
35 35 35 35
1 1 3 2
3
70 35 70 35
m
ie P M m,N n n 2
.
35 2
QUESTION 18.
U and V have joint density function:
2u v
fU,V u, v , where 10 u 20 and 5 v 5
3,000
CA PRAVEEN PATWARI 18 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
QUESTION 19.
Calculate the covariance of the random variables X and Y whose joint distribution is as follows:
0 1 2
1 0.1 0.1 0
QUESTION 20.
Calculate the correlation coefficient of U and V, where:
2u v
fU,V u, v , where 10 u 20 and 5 v 5
3,000
140 5
You are given that E U and E V .
9 18
QUESTION 21.
If X ~ Poi and Y ~ Poi are independent random variables, obtain the probability function of Z =X + Y.
QUESTION 22.
2 2 2
The random variables X, Y, and Z have means and variances X 4, Y 5, Z 6, X 1, Y 4 and Z 3 .
The covariances are as follows:
cov X, Y 3 cov X,Z 2 cov Y,Z 1
QUESTION 23.
A company has three telephone lines coming into its switchboard. The first line rings on average 3.5 times
per half-hour, the second rings on average 3.9 times per half-hour, and the third line rings on average 2.1
times per half-hour. Assuming that the numbers of calls are independent random variables having Poisson
distributions, calculate the probability that in half an hour the switchboard will receive:
QUESTION 24.
If the number of minutes it takes for a mechanic to check a tyre is a random variable having an exponential
distribution with mean 5, obtain the probability that the mechanic will take:
QUESTION 25.
f x, y c x 3y 0 x 2,0 y 2
QUESTION 26.
f x, y 2 x y 1, x 0, y 0
(ii) Derive the conditional PDF of X given Y = y using the result from part (i)
QUESTION 27.
f x, y
6
1 2
x xy 0 y x 2
QUESTION 28.
fX ,Y x, y
4
5
2
3x xy 0 x 1, 0 y 1
Determine:
QUESTION 29.
Calculate the correlation coefficient of X and Y, where X and Y have the joint distribution:
0 1 2
1 0.1 0.1 0
QUESTION 30.
Claim sizes on a home insurance policy are normally distributed about a mean of £800 and with a standard
deviation of £100. Claims sizes on a car insurance policy are normally distributed about a mean of £1,200
and with a standard deviation of £300. All claim sizes are assumed to be independent.
To date, there have already been home claims amounting to £800, but no car claims.
Calculate the probability that after the next 4 home claims and 3 car claims the total size of car claims ex-
ceeds the total size of the home claims.
QUESTION 31.
Let X be a random variable with mean 3 and standard deviation 2, and let Y be a random variable with
mean 4 and standard deviation 1. X and Y have a correlation coefficient of –0.3 Let Z = X + Y.
Calculate:
(i) cov(X, Z)
(ii) var(Z).
QUESTION 32.
X has a Poisson distribution with mean 5 and Y has a Poisson distribution with mean 10. If cov(X, Y) = –12,
calculate the variance of Z where Z = X – 2Y + 3.
QUESTION 33.
For a certain company, claim sizes on car policies are normally distributed about a mean of £1,800 and with
standard deviation £300, whereas claim sizes on home policies are normally distributed about a mean of
£1,200 and with standard deviation £500. Claim sizes are assumed to be independent.
Calculate the probability that a car claim is at least twice the size of a home claim.
CHAPTER
CONDITIONAL EXPECTATION
QUESTION 34.
Two random variables X and Y have the following discrete joint distribution
10 20 30
Calculate E(Y | X = 1)
QUESTION 35.
3
f x, y x x y 0 x 1, 0 y 2
5
QUESTION 36.
fX ,Y x, y
1
6
2
x x xy 0 y x 2
QUESTION 37.
(i) Calculate E(Y) from first principles given that the joint density function of X and Y is:
3
f x, y x x y 0 x 1, 0 y 2
5
3x 4
(ii) Given that E Y | X x , calculate E E Y | X .
3 x 1
QUESTION 38.
10 20 30
QUESTION 39.
E Y E E Y| X
The random variable X follows the gamma distribution with parameters 3 and 2. Y is a related
variable with conditional mean and variance of:
E Y | X x 3x 1 var Y | X x 2x 5
2
QUESTION 40.
Suppose that X is a standard normal random variable, and the conditional distribution of a Poisson random
2
variable Y, given the value of X = x, has expectation x 1 .
QUESTION 41.
Two discrete random variables, X and Y, have the following joint probability function:
1 2 3 4
fU,V u, v 6 2uv u
2
0 u v 1
ASSIGNMENT 2
CHAPTER
QUESTION 2.
The cost of repairing a vehicle following an accident has mean $6,200 and standard deviation $650. A study
was carried out into 65 vehicles that had been involved in accidents. Calculate the probability that the total
repair bill for the vehicles exceeded $400,000.
QUESTION 3.
Let X be a Poisson variable with parameter 20. Use the normal approximation to obtain a value for
P X 15 and use tables to compare with the exact value.
QUESTION 4.
The average number of calls received per hour by an insurance company's switchboard is 5. Calculate the
probability that in a working day of eight hours, the number of telephone calls received will be:
(i) exactly 36
Assuming that the number of calls has a Poisson distribution, calculate the exact probabilities and also the
approximate probabilities using a normal approximation.
CA PRAVEEN PATWARI 26 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
QUESTION 5.
Use a normal approximation to calculate an approximate value for the probability that an observation from
a Gamma(25, 50) random variable falls between 0.4 and 0.8.
QUESTION 6.
Calculate the approximate probability that the mean of a sample of 10 observations from a Beta(10,10)
random variable falls between 0.48 and 0.52.
QUESTION 7.
If X follows the gamma distribution with parameters 10 and 0.2 , calculate the probability that X ex-
ceeds 80
QUESTION 8.
The probability of any given policy in a portfolio of term assurance policies lapsing before it expires is con-
sidered to be 0.15. Consider a random sample of 100 such policies.
Calculate the approximate probability that more than 20 policies will lapse before they expire
QUESTION 9.
A company issues questionnaires to clients to obtain feedback on the clarity of their brochure. It is thought
that 5% of clients do not find the brochure helpful.
Let N denote the number of clients who do not find the brochure helpful in a sample of 1,000 responses.
QUESTION 10.
In a certain large population 45% of people have blood group A. A random sample of 300 individuals is
chosen from this population.
Calculate an approximate value for the probability that more than 115 of the sample have blood group A.
CA PRAVEEN PATWARI 27 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
QUESTION 11.
Consider a random sample of size 16 taken from a normal distribution with mean 25 and variance
2
4 . Let the sample mean be denoted X .
State the distribution of X and hence calculate the probability that X assumes a value greater than 26.
QUESTION 12.
Suppose that the sums assured under policies of a certain type are modelled by a distribution with mean
£8,000 and standard deviation £3,000. Consider a group of 100 independent policies of this type.
Calculate the approximate probability that the total sum assured under this group of policies exceeds
£845,000.
QUESTION 13.
A computer routine selects one of the integers 1, 2, 3, 4, 5 at random and replicates the process a total of
100 times. Let S denote the sum of the 100 numbers selected.
Calculate the approximate probability that S assumes a value between 280 and 320 inclusive.
CHAPTER
QUESTION 15.
Calculate the probability that, for a random sample of 5 values taken from a N 100,25
2
population:
(i) X will be between 80 and 120
QUESTION 16.
Determine:
QUESTION 17.
For random samples of size 10 and 25 from two normal populations with equal variances, use the F distri-
S2 S2
bution to determine the values of and such that P 12 0.05 and P 12 0.05 , where subscript
S S
2 2
1 represents the sample of size 10 and subscript 2 represents the sample of size 25.
QUESTION 18.
A random sample of 10 observations is drawn from the normal distribution with mean and standard
deviation 15. Independently, a random sample of 25 observations is drawn from the normal distribution
with mean and standard deviation 12. Let X and Y denote the respective sample means.
Evaluate P X Y 3 .
QUESTION 19.
(b) P F8,5 c 5%
QUESTION 20.
Let X1 , X2 ,..., X9 be a random sample from a N 0,
2
distribution. Let X and S 2
denote the sample mean
QUESTION 21.
House prices in region X are normally distributed with a mean of £100,000 and a standard deviation of
£10,000. House prices in region Y are normally distributed with a mean of £90,000 and a standard devia-
tion of £5,000. A random sample of 10 houses is taken from region X and a random sample of 5 houses from
region Y.
(i) the region X sample mean is greater than the region Y sample mean
(ii) the difference between the sample means is less than £5,000
CA PRAVEEN PATWARI 30 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
(iii) the region X sample variance is less than the region Y sample variance
(iv) the region X sample standard deviation is more than four times greater than the region Y sample
standard deviation
QUESTION 22.
The time taken to process simple home insurance claims has a mean of 20 mins and a standard deviation of
5 mins.
(i) the sample mean of the times to process 5 claims is less than 15 mins
(ii) the sample mean of the times to process 50 claims is greater than 22 mins
(iii) the sample variance of the time to process 5 claims is greater than 6.65 mins
(iv) the sample standard deviation of the time to process 30 claims is less than 7 mins
(v) both (i) and (iii) occur for the same sample of 5 claims.
QUESTION 23.
A statistician suggests that, since a t variable with k degrees of freedom is symmetrical with mean 0 and
k k
variance for k 2 , one can approximate the distribution using the normal variable N 0, .
k 2 k 2
(i) Use this to obtain an approximation for the upper 5% percentage points for a t variable with:
(ii) Compare your answers with the exact values from table and comment briefly on the result.
ASSIGNMENT 3
CHAPTER
POINT ESTIMATION
QUESTION 1.
A random sample from an Exp( ) distribution is as follows
14.84, 0.19, 11.75, 1.18, 2.44, 0.53
Calculate the method of moments estimate for .
QUESTION 2.
The random sample :
2.6, 1.9, 3.8, –4.1, –0.2, –0.7, 1.1, 6.9
is taken from a U , distribution.
QUESTION 3.
A random sample from a Bin(n, p)distribution yields the following values
4, 2, 7, 4, 1, 4, 5, 4
Calculate method of moments estimates of n and p.
QUESTION 4.
A random sample of size 10 from a Type 2 negative binomial distribution with parameters k and p is as fol-
lows:
1, 1, 0, 1, 1, 1, 3, 2, 0, 5
QUESTION 5.
A random sample of size n (ie x1 , x 2 ,..., x n ) is taken from a Poi distribution.
(ii) The sum of a sample of 10 observations from a Poisson( ) distribution is 24. Calculate the maximum
likelihood estimate, ̂ .
QUESTION 6.
Claims (in £000s) on a particular policy have a distribution with PDF given by:
2
f x 2cxe
cx
x 0
QUESTION 7.
The number of claims in a year on a pet insurance policy are distributed as follows
No. of claims, n 0 1 2 3
P(N = n) 5 3 1 9
Information from the claims file for a particular year showed that there were 60 policies with 1 claim, 24
policies with 2 claims and 16 policies with 3 or more claims. There was no information about the number of
policies with no claims.
Calculate the maximum likelihood estimate of .
QUESTION 8.
The number of claims, X, per year arising from a low-risk policy has a Poisson distribution with mean .
The number of claims, Y, per year arising from a high-risk policy has a Poisson distribution with mean 2 .
A sample of 15 low-risk policies had a total of 48 claims in a year and a sample of 10 high-risk policies had a
total of 59 claims in a year. Determine the maximum likelihood estimate of based on this information.
QUESTION 9.
2
The estimator, ̂ , is used to estimate the variance of a N , 2
distribution based on a random sample of n
observations:
1 n
Xi X
2
2
ˆ
n i 1
2
(i) Determine the mean square error of ̂ .
2
(ii) Determine whether ̂ is consistant.
QUESTION 10.
(i) Show that the CRLB for unbiased estimators of , based on a random sample of n observations from a
2
N ,
2
2
distribution with known variance , is given by
n
(ii) Show that the maximum likelihood estimator ˆ X attains the CRLB.
QUESTION 11.
Waiting times in a post office queue have an Exp distribution. Ten people had waiting times (in mi-
nutes) of:
1.6 0.9 1.1 2.1 0.7 1.5 2.3 1.7 3.0 3.4
QUESTION 12.
The number of claims arising in a year on a certain type of insurance policy has a Poisson distribution with
parameter .
The insurer's claim file shows that claims were made on 238 policies during the last year with the following
frequency distribution for the number of claims:
1 174
2 50
3 10
4 4
5 0
No information is available from the policy file, that is, only data concerning those policies on which claims
were made can be used in the estimation of the claim rate (This is why there is no entry for the number
of claims being 0 in the table.)
x 1e
, where x is the mean number of claims for policies that have at least one claim.
(iii) Solve this equation, by any means, for the given data and calculate the resulting estimate of to two
decimal places.
(iv) Hence, estimate the percentage of all policies with no claims during the year.
QUESTION 13.
Suppose that unbiased estimators X1 and X 2 of a parameter have been determined by two independent
Let Y be the combination given by Y X1 X 2 , where and denote non-negative weights.
(i) Derive the relationship satisfied by and so that Y is also an unbiased estimator of .
2
(ii) Determine the variance of Y in terms of and if, additionally, the weights are chosen such that the
variance of Y is a minimum.
QUESTION 14.
A random sample x1 , x 2 ,..., x n is taken from a population, which has the probability distribution function
F(x) and the density function f(x). The values in the sample are arranged in order and the minimum and
maximum values x MIN and x MAX are recorded.
n
(i) Show that the distribution function of x MAX is F x , and find a corresponding formula for the dis-
The original distribution is now believed to be a Pareto ,1 distribution, ie the probability density
function is:
f x 1
, x 0
1 x
(ii) Determine the distribution function of X, and hence determine the distribution function of X MAX .
(iii) Show that the probability density function for the distribution of X MIN , is:
n
fX x n1
x 0
MIN
1 x
(v) Obtain an equation for the maximum likelihood estimator of using x MAX . Comment on the difficulty
of solving this equation.
(vi) Outline what further information you would need here in order to obtain a method of moments esti-
mate of .
QUESTION 15.
A discrete random variable has a probability function given by:
x 2 4 5
P X x
1 1 3
8
2 2
3 8
(i) Give the range of possible values for the unknown parameter .
(iii) Write down an expression for the likelihood of these data and hence show that the maximum likelih-
ood estimate ̂ satisfies the quadratic equation:
2 111 91
180ˆ ˆ 0
8 32
(iv) Hence determine the maximum likelihood estimate and explain why one root is rejected as a possible
estimate of .
QUESTION 16.
A motor insurance portfolio produces claim incidence data for 100,000 policies over one year. The table
below shows the observed number of policyholders making 0, 1, 2, 3, 4, 5, and 6 or more claims in a year.
0 87,889
1 11,000
2 1,000
3 100
4 10
5 1
6 —
Total 100,000
(i) (a) Estimate the parameter of the Poisson distribution to fit the above data using method of mo-
ments.
(b) Hence calculate the expected number of policies giving rise to the different numbers of claims
assuming the Poisson model.
(ii) Show that the estimate of the Poisson parameter calculated from the above data using the method of
moments is also the maximum likelihood estimate of this parameter.
(iii) (a) Estimate the two parameters of the Type 2 negative binomial distribution to fit the above data
using the method of moments.
(b) Hence calculate the expected number of policies giving rise to the different numbers of claims
assuming a negative binomial model.
k x 1
PX x q P X x 1
x
(iv) Explain briefly why you would expect a negative binomial distribution to fit the above data better than
a Poisson distribution.
CHAPTER
CONFIDENCE INTERVALS
QUESTION 17.
The average IQ of a random sample of 50 university students is found to be 132. Calculate a symmetrical
95% confidence interval for the average IQ of university students, assuming that IQs are normally distri-
buted. It is known from previous studies that the standard deviation of IQs among students is approximate-
ly 20.
QUESTION 18.
The average IQ of a random sample of 50 university students is found to be 132. Calculate a symmetrical 99%
prediction interval for the average IQ of university students, assuming that IQs are normally distributed. It is
known from previous studies that the standard deviation of IQs among students is approximately 20.
QUESTION 19.
Calculate:
for the standard deviation of the heights of the children in the population based on the information given in
the last question.
QUESTION 20.
The heights of 10-year-old children are normally distributed. The heights of a random sample of five child-
ren (in cm) are: 124cm, 122cm, 130cm, 125cm and 132cm.
Calculate a 90% confidence interval for the predicted height of a 10-year-old child based on these data val-
ues.
QUESTION 21.
We have obtained a value of 1 from the binomial distribution with parameters n = 20 and .
QUESTION 22.
In a one-year mortality investigation, 45 of the 250 ninety-year-olds present at the start of the investigation
died before the end of the year. Assuming that the number of deaths has a binomial distribution with para-
meters n =250 and q, calculate a symmetrical 90% confidence interval for the unknown mortality rate q.
QUESTION 23.
In a one-year investigation of claim frequencies for a particular category of motorists, the total number of
claims made under 5,000 policies was 800. Assuming that the number of claims made by individual motor-
ists has a Poi( ) distribution, calculate a symmetrical 90% confidence interval for the unknown average
claim frequency ..
QUESTION 24.
A motor company runs tests to investigate the fuel consumption of cars using a newly developed fuel addi-
tive. Sixteen cars of the same make and age are used, eight with the new additive and eight as controls. The
results, in miles per gallon over a test track under regulated conditions, are as follows:
Calculate a 95% confidence interval for the increase in miles per gallon achieved by cars with the additive.
State clearly any assumptions required for this analysis.
QUESTION 25.
In a one-year investigation of claim frequencies for a particular category of motorists, there were 150
claims from the 500 policyholders aged under 25 and 650 claims from the 4,500 remaining policyholders.
Assuming that the numbers of claims made by the individual motorists in each category have independent
Poisson distributions, calculate a 99% confidence interval for the difference between the two Poisson pa-
rameters.
QUESTION 26.
A survey was carried out to find out the number of hours that actuarial students spend watching television
per week. It was discovered that for a sample of 10 students, the following times were spent watching tele-
vision:
8, 4, 7, 5, 9, 7, 6, 9, 5, 7
(i) (a) Calculate a symmetrical 95% confidence interval for the mean time an actuarial student spends
watching television per week.
(b) Write down the assumptions needed to calculate the confidence interval in part (a).
(ii) Calculate a symmetrical 95% prediction interval for the time an actuarial student spends watching
television per week.
(iii) (a) Describe the limiting case of the formulae for the intervals in parts (i)(a) and (ii) as n tends to
infinity.
(b) Explain which of the two intervals calculated will be more sensitive to the assumptions in part
(i)(b).
QUESTION 27.
A researcher investigating attitudes to Sunday shopping reports that, in a sample of 8 interviewees, 7 were
in favour of more opportunities to shop on Sunday.
Calculate an exact 95% confidence interval for the underlying proportion in favour of this idea using the
binomial distribution.
QUESTION 28.
Two inspectors carry out property valuations for an estate agency. Over a particular week they each go out
to similar properties. The table below shows their valuations (in £000s):
A 102 98 93 86 92 94 89 97
B 86 88 92 95 98 97 94 92 91
(i) (a) Comment on the possible assumption of normality and equal variances for the two underlying
populations using the diagrams.
(b) Calculate a 95% confidence interval for this common variance using the equal variance assump-
tion from part (a).
(c) Calculate a 95% confidence interval for the mean difference between the valuations by A and B,
commenting briefly on the result.
The estate agency employing the inspectors decides to test their valuations by sending them each to
the same set of eight houses, independently and without knowledge that the other is going. The result-
ing valuations (in £000s) follow:
Property
1 2 3 4 5 6 7 8
(ii) Calculate a 90% confidence interval for the mean difference between valuations by A and B, comment-
ing briefly on the result.
QUESTION 29.
The ordered remission times (in weeks) of 20 leukaemia patients are given in the table:
1 1 2 2 3
4 4 5 5 8
8 8 11 11 12
12 15 17 22 23
Suppose the remission times can be regarded as a random sample from an exponential distribution with
density:
f x; e
x
, x 0
QUESTION 30.
Heights of males with classic congenital adrenal hyperplasia (CAH) are assumed to be normally distributed.
(i) Determine the minimum sample size to ensure that a 95% confidence interval for the mean height has
a maximum width of 10cm, if:
(a) a previous sample has a standard deviation of 8.4 cm
(b) the population standard deviation is 8.4 cm.
(ii) Determine the minimum sample size to ensure that a 95% prediction interval for the height of a male
with CAH has a maximum width of 38cm, if:
(a) a previous sample has a standard deviation of 8.4 cm
(b) the population standard deviation is 8.4 cm.
(iii) Comment on the difference in sample size required for part 1 & 2.
CA PRAVEEN PATWARI 43 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
QUESTION 31.
The amounts of individual claims arising under a certain type of general insurance policy are known from
past experience to conform to a lognormal distribution in which the standard deviation is 1.8 times the
mean. An actuary has found that the lower and upper limits of a 95% confidence interval for the mean
claim amount are £4,250 and £4,750.
Evaluate the lower and upper limits of a 95% confidence interval for the lognormal parameter .
QUESTION 32.
A general insurance company is debating introducing a new screening programme to reduce the claim
amounts that it needs to pay out. The programme consists of a much more detailed application form that
takes longer for the new client department to process. The screening is applied to a test group of clients as
a trial whilst other clients continue to fill in the old application form. It can be assumed that claim payments
follow a normal distribution.
The claim payments data for samples of the two groups of clients are (in £100 per year)
Without screening 24.5 21.7 35.2 15.9 23.7 34.2 29.3 21.1 23.5 28.3
With screening 22.4 21.2 36.3 15.7 21.5 7.3 12.8 21.2 23.9 18.4
(i) (a) Calculate a 95% confidence interval for the difference between the mean claim amounts.
(ii) (a) Calculate a 95% confidence interval for the ratio of the population variances
(b) Hence, comment on the assumption of equal variances required in part (i).
Assume that the sample sizes taken from the clients with and without screening are always equal to
keep processing easy.
(iii) Calculate the minimum sample size so that the width of a 95% confidence interval for the difference
between mean claim amounts is less than 10, assuming that the samples have the same variances as in
part (i).
CHAPTER
HYPOTHESIS TESTING
QUESTION 33.
A random variable X is believed to follow an Exp( ) distribution. In order to test the null hypothesis =
20 against the alternative hypothesis =30, where 1 / , a single value is observed from the distribu-
tion. If this value is less than 28, H 0 is not rejected, otherwise H 0 is rejected.
QUESTION 34.
A short screening test has just been developed for depression. An independent blind comparison was made
with a gold-standard test for diagnosis of depression among 200 psychiatric outpatients.
Among the 50 outpatients found to be depressed according to the gold-standard test, 35 patients tested
positive under the new short test. Among 150 patients found not to be depressed according to the gold-
standard test, 30 patients tested positive under the new short test.
Calculate the sensitivity and specificity of the short screening test, assuming that the gold-standard test
correctly classifies each individual.
QUESTION 35.
A random variable X is believed to follow an Exp( ) distribution. In order to test the null hypothesis = 20
against the alternative hypothesis = 30, where 1 / , a single value is observed from the distribution. If
this value of X is less than k, H 0 is not rejected, otherwise H 0 is rejected.
QUESTION 36.
The average IQ of a sample of 50 university students was found to be 105. Carry out a statistical test to con-
clude whether the average IQ of university students is greater than 100, assuming that IQs are normally
distributed. It is known from previous studies that the standard deviation of IQs among students is approx-
imately 20.
QUESTION 37.
The annual rainfall in centimetres at a certain weather station over the last ten years has been as follows:
17.2 28.1 25.3 26.2 30.7 19.2 23.4 27.5 29.5 31.6
Scientists at the weather station wish to test whether the average annual rainfall has increased from its for-
mer long-term value of 22 cm. Test this hypothesis at the 5% level, stating any assumptions that you make.
QUESTION 38.
A new gene has been identified that makes carriers of it particularly susceptible to a particular degenera-
tive disease. In a random sample of 250 adult males born in the UK, 8 were found to be carriers of the dis-
ease. Test whether the proportion of adult males born in the UK carrying the gene is less than 10%.
QUESTION 39.
In a one-year investigation of claim frequencies for a particular category of motorists, the total number of
claims made under 5,000 policies was 800. Assuming that the number of claims made by individual motor-
ists has a Poi( ) distribution, test at the 1% level whether the unknown average claim frequency is less
than 0.175.
QUESTION 40.
The average blood pressure for a control group C of 10 patients was 77.0 mmHg. The average blood pres-
sure in a similar group T of 10 patients on a special diet was 75.0 mmHg. Carry out a statistical test to as-
sess whether patients on the special diet have lower blood pressure.
10 10
ci 59,420 and t i 56,390 .
2 2
You are given that
i 1 i 1
QUESTION 41.
A car manufacturer runs tests to investigate the fuel consumption of cars using a newly developed fuel ad-
ditive. Sixteen cars of the same make and age are used, eight with the new additive and eight as controls.
The results, in miles per gallon over a test track under regulated conditions, are as follows:
Control 27.0 32.2 30.4 28.0 26.5 25.5 29.6 27.2
Additive 31.4 29.9 33.2 34.4 32.0 28.7 26.1 30.3
If C is the mean number of miles per gallon achieved by cars in the control group, and A is the mean num-
ber of miles per gallon achieved by cars in the group with fuel additive, test:
(i) H0 : A C 0 vs H1 : A C 0
(ii) H0 : A C 6 vs H1 : A C 6
QUESTION 42.
The average blood pressure for a control group C of 10 patients was 77.0 mmHg. The average blood pres-
sure in a similar group T of 10 patients on a special diet was 75.0 mmHg. Test whether the variances in the
two populations can be considered to be equal.
10 10
ci 59,420 and t i 56,390 .
2 2
You are given that
i 1 i 1
QUESTION 43.
A sample of 100 claims on household policies made during the year just ended showed that 62 were due to
burglary. A sample of 200 claims made during the previous year had 115 due to burglary.
Test the hypothesis that the underlying proportion of claims that are due to burglary is higher in the second
year than in the first.
QUESTION 44.
In order to increase the efficiency with which employees in a certain organisation can carry out a task, 5
employees are sent on a training course. The time in seconds to carry out the task both before and after the
training course is given below for the 5 employees:
A B C D E
Before 42 51 37 43 45
After 38 37 32 40 48
Test whether the training course has had the desired effect.
QUESTION 45.
1
P X i , i = 1, 2, 3, 4, 5, 6 where X is the number thrown
6
H 1 : Number thrown does not have the distribution specified in the model
x: 1 2 3 4 5 6
fi : 43 56 54 47 41 59
2
Carry out a test to assess whether the data comes from a fair die.
QUESTION 46.
The table below shows the causes of death in elderly men derived from a study in the 1970s. Carry out a
chi-square test to determine whether these percentages can still be considered to provide an accurate de-
scription of causes of death in 2000.
Cancer 8% 286
QUESTION 47.
The numbers of claims made last year by individual motor insurance policyholders were:
Number of claims 0 1 2 3 4+
Carry out a chi-square test to determine whether these frequencies can be considered to conform to a Pois-
son distribution.
QUESTION 48.
On a particular run of a process which bottles a drink, it is thought that the cleansing process of the bottles
has partially failed. The bottles have been boxed into crates, each containing six bottles. It is thought that
each bottle, independently of all others, has the same chance of containing impurities.
A survey has been conducted, and each bottle in a random sample of 200 crates has been tested for impuri-
ties. The table below gives the numbers of crates in the sample which had the respective number of bottles
which contained impurities:
Number of crates: 38 70 58 25 6 2 1
QUESTION 49.
For each of three insurance companies, A, B, and C, a random sample of non-life policies of a particular kind
is examined. It turns out that a claim (or claims) have arisen in the past year in 23% of the sampled policies
for A, in 28% of those for B, and in 20% of those for C.
Test for differences in the underlying proportions of policies of this kind which have given rise to claims in
the past year among the three companies in the two situations:
(a) the sample sizes were 100, 100, and 200 respectively
(b) the sample sizes were 300, 300, and 600 respectively.
QUESTION 50.
In an investigation into the effectiveness of car seat belts, 292 accident victims were classified according to
the severity of their injuries and whether they were wearing a seat belt at the time of the accident. The re-
sults were as follows:
Death 3 47
Severe injury 78 32
Determine whether the severity of injuries sustained is dependent on whether the victims are wearing a
seat belt.
QUESTION 51.
The table below shows the numbers of births during one month at a particular hospital classified according
to whether a particular medical characteristic was or wasn't present during childbirth.
Characteristic present 10 12 9 4 3 38
Total 15 63 47 29 8 162
Assess whether the presence of this characteristic is dependent on the age of the mother.
QUESTION 52.
A statistical test is used to determine whether or not an anti-smoking campaign carried out 5 years ago has
led to a significant reduction in the mean number of smoking related illnesses. The probability value of the
test statistic is 7%.
CA PRAVEEN PATWARI 50 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
(i) 10%
(ii) 5%.
QUESTION 53.
A random sample, x1 ,..., x 10 , from a normal population gives the following values:
9.5 18.2 4.69 3.76 14.2 17.13 15.69 13.9 15.7 7.42
xi 120.19 xi 1,693.6331
2
(i) Test at the 5% level whether the mean of the whole population is 15 if the variance is:
(a) unknown
(b) 20.
QUESTION 54.
A professional gambler has said: 'Flipping a coin into the air is fair, since the coin rotates about a horizontal
axis, and it is equally likely to be either way up when it first clips the ground. So a flicked coin is equally
likely to land showing heads or tails. However, spinning a coin on a table is not fair, since the coin rotates
about a vertical axis, and there is a systematic bias causing it to tilt towards the side where the embossed
pattern is heavier. In fact, when a new coin is spun, it is more than twice as likely to land showing tails as it
is to land showing heads.'
After hearing this, an experiment was carried out, spinning a new coin 25 times on a polished table; the
coin showed tails 18 times.
Comment on whether the results of experiment support the gambler’s claims about the probability when a
coin is spun.
QUESTION 55.
A blood test has been used on 1,000 people to detect whether they have a particular condition. Of the 427
people who had a positive result, 369 of them had the condition. Of the 573 people who had a negative re-
sult, 15 of them had the condition.
CA PRAVEEN PATWARI 51 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
A second blood test is used on 1,000 people which has a sensitivity of 80% and a specificity of 60%.
For this blood test, 544 people had a positive result.
QUESTION 56.
The lengths of a random sample of 12 worms of a particular species have a mean of 8.54 cm and standard
deviation of 2.97 cm. Let denote the mean length of a worm of this species. It is required to test:
H0 : 7cm vs H1 : 7cm
QUESTION 57.
A general insurance company is debating introducing a new screening programme to reduce the claim
amounts that it needs to pay out. The programme consists of a much more detailed application form that
takes longer for the new client department to process. The screening is applied to a test group of clients as
a trial whilst other clients continue to fill in the old application form. It can be assumed that claim payments
follow a normal distribution.
The claim payments data for samples of the two groups of clients are (in £100 per year)
Without screening 24.5 21.7 45.2 15.9 23.7 34.2 29.3 21.1 23.5 28.3
With screening 22.4 21.2 36.3 15.7 21.5 7.3 12.8 21.2 23.9 18.4
(i) Test the hypothesis that the new screening programme reduces the mean claim amount.
QUESTION 58.
The total claim amounts (in £m) for home and car insurance over a year for similar sized companies are
collected by an independent advisor:
(i) Test whether the mean home and car claims are equal. State clearly your probability value.
It was subsequently discovered that the results were actually 5 consecutive years from the same com-
pany.
(ii) Carry
(iii) out an appropriate test of whether the mean home and car claims are equal
QUESTION 59.
In an investigation into a patient's red corpuscle count, the number of such corpuscles appearing in each of
400 cells of a haemocytometer was counted. The results were as follows:
No. of cells 40 66 93 94 62 25 14 5 1
It is thought that a Poisson distribution with mean provides an appropriate model for this situation.
(a) Estimate .
(b) Test the fit of the Poisson model.
For a healthy person, the mean count per cell is known to be equal to 3. For a patient with certain of
anaemia, the number of red blood corpuscles is known to be lower than this.
(c) Test whether this patient has one of these types of anaemia.
QUESTION 60.
In a recent study investigating a possible genetic link between individuals' susceptibility to developing
symptoms of AIDS, 549 men who had been diagnosed HIV positive were classified according to whether
they carried two particular alleles (DRB1*0702 and DQA1*O2O1). The results were as follows:
Condition of individual Free of symptoms Early symptoms Suffering from AIDS Total
Alleles present 24 7 17 48
Test whether there is an association between the presence of the alleles and the classification into the three
AIDS statuses using these results.
QUESTION 61.
A politician has said: 'A recent study in a particular area showed that 25% of the 400 teenagers who were
living in single-parent families had been in trouble with the police, compared with only 20% of the 1,200
teenagers who were living in two-parent families. Our aim is to reduce the number of single-parent families
in order to reduce the crime rates during the next decade.'
(i) Carry out a contingency table test at the 5% significance level to assess whether there is a significant
association between living in a single-parent family and getting into trouble with the police.
(ii) Comment on the politician's statement.
QUESTION 62.
A particular area in a town suffers a high burglary rate. A sample of 100 streets is taken, and in each of the
sampled streets, a sample of six similar houses is taken. The table below shows the number of sampled
houses, which have had burglaries during the last six months.
No. of streets f 39 38 18 4 0 1 0
(i) (a) State any assumptions needed to justify the use of a binomial model for the number of houses
per street which have been burgled during the last six months.
(b) Derive the maximum likelihood estimate of p, the probability that a house of the type sampled
has been burgled during the last six months.
(c) Determine the probabilities for the binomial model using your estimate of p.
(d) Comment on the fit without doing a formal test.
An insurance company works on the basis that the probability of a house being burgled over a six
month period is 0.18.
(ii) Carry out a test to investigate whether the binomial model with this value of p provides a good fit for
the data.
QUESTION 63.
It is desired to investigate the level of premium charged by two companies for contents policies for houses
in a certain area. Random samples of 10 houses insured by Company A are compared with 10 similar hous-
es insured by Company B. The premiums charged in each case are as follows
Company A 117 154 166 189 190 202 233 263 289 331
Company B 142 160 166 188 221 241 276 279 284 302
The line plots below show the sample values for the two companies:
(i) Comment briefly on the validity of the assumptions required for a two-sample t test for the premiums
of these two companies using the plots.
A 2,134, A B 2,259, B
2 2
For these data: 494,126, 541,463 .
(ii) Carry out a formal test to check that it is appropriate to apply a two-sample t test to these data, as-
suming that the premiums are normally distributed.
(iii) Test whether the level of premiums charged by Company B was significantly higher than that charged
by Company A, stating the p value and conclusion clearly.
(iv) (a) Calculate a 95% confidence interval for the difference between the proportions of premiums of
each company that are in excess of £200.
(b) Comment briefly on your result to part (iv)(a).
The average premium charged by Company A in the previous year was £170.
(v) Test whether Company A appears to have increased its premiums since the previous year.
ASSIGNMENT 4
CHAPTER
CORRELATION
QUESTION 1.
A sample of ten claims and corresponding payments on settlement for household policies is taken from the
business of an insurance company.
Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Draw a scatterplot and comment on the relationship between claims and payments.
QUESTION 2.
The rate of interest of borrowing, over the next five years, for ten companies is compared to each compa-
ny’s leverage ratio (its debt to equity ratio).
Leverage ratio, x 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.5 2.8 3.0
Interest rate (%), y 2.8 3.4 3.5 3.6 4.6 6.3 10.2 19.7 31.3 42.9
Draw a scatterplot and comment on the relationship between company borrowing (leverage) and interest
rate. Hence apply a transformation to obtain a linear relationship.
QUESTION 3.
Show that:
x i
2
S xx x i x x i
2
x i nx
2 2 2
n
QUESTION 4.
For the claims settlement data, we have:
Claim (£100's)x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (£100's) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Calculate Pearson's correlation coefficient for the claims settlement data and comment on its value.
QUESTION 5.
For the original borrowing rate data:
Leverage ratio, x 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.5 2.8 3.0
Interest rate y 0.028 0.034 0.035 0.036 0.046 0.063 0.102 0.197 0.313 0.429
QUESTION 6.
Calculate Spearman's rank correlation coefficient for the claims settlement data and comment.
Claim (£100's)x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (£100's) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
QUESTION 7.
Calculate Spearman's rank correlation coefficient for the original borrowing rate data and comment.
Leverage ratio, x 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.5 2.8 3.0
Interest rate (%), y 2.8 3.4 3.5 3.6 4.6 6.3 10.2 19.7 31.3 42.9
QUESTION 8.
Calculate Kendall's rank correlation coefficient for the claims settlement data and comment.
Claim (£100's)x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (£100's) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
QUESTION 9.
Calculate Kendall's rank correlation coefficient for the original borrowing rate data and comment on its
value.
Leverage ratio, x 0.1 0.4 0.5 0.8 1.0 1.8 2.0 2.5 2.8 3.0
Interest rate (%), y 2.8 3.4 3.5 3.6 4.6 6.3 10.2 19.7 31.3 42.9
QUESTION 10.
Claim (£100's)x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (£100's) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
QUESTION 11.
Considering the data on claims and settlements, carry out the test:
H0 : 0.9 vs H1 : 0.9
Claim (£100's)x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment (£100's) y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
QUESTION 12.
An actuary wants to investigate if there is any correlation between students' scores in the CS1 mock exam
and the CS2 mock exam. Data values from 22 students were collected and the results are:
Student 1 2 3 4 5 6 7 8 9 10 11
Student 12 13 14 15 16 17 18 19 20 21 22
2
You are given that d 494, n c 174 and n d 57 .
Test H0 : 0 vs H1 : 0 for the mock score data using the Spearman's rank correlation coefficient and
the Kendall's rank correlation coefficient.
QUESTION 13.
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights of unborn
babies. The table below shows the estimated weights for one particular baby at fortnightly intervals during
the pregnancy.
Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5
2
x 210, x 7,420, y 15.3, y 42.03, xy 549.8
2
(i) Show that Sxx 70, Syy 3.015 and Sxy 14.3 .
(ii) Show that the (Pearson's) linear correlation coefficient is equal to 0.984 and comment.
(iii) Explain why the Spearman's and Kendall's rank correlation coefficients are both equal to 1.
(v) Test whether Pearson’s sample correlation coefficient supports the hypothesis that the true corre-
lation parameter is greater than 0.9.
QUESTION 14.
A schoolteacher is investigating the claim that class size does not affect GCSE results. His observations of
nine GCSE classes are as follows:
Class XI X2 X3 X4 Y1 Y2 Y3 Y4 Y5
Students in class (c ) 35 32 27 21 34 30 28 24 7
Average GCSE point score for class (P ) 5.9 4.1 2.4 1.7 6.3 5.3 3.5 2.6 1.6
2
c 238 c p 33.4 p cp 983
2
6,884 149.62
(b) Use Pearson's correlation coefficient to test whether or not the data agree with the claim that
class size does not affect GCSE results.
Following his investigation, the teacher concludes, 'bigger class sizes improve GCSE results'.
QUESTION 15.
A university wishes to analyse the performance of its students on a particular degree course. It records the
scores obtained by a sample of 12 students at entry to the course, and the scores obtained in their final ex-
aminations by the same students. The results are as follows:
Student A B C D E F G H I J K L
2
x 836 y 867 x y x x y y 1,122
2
60,016 63,603
(i) (a) Explain why Spearman's and Kendall's rank correlation coefficients cannot be calculated here
using the simplified formula.
(b) Calculate the Pearson's correlation coefficient.
(ii) Test whether this data comes from a population with Pearson's correlation coefficient equal to 0.75.
CHAPTER
LINEAR REGRESSION
QUESTION 16.
A sample of ten claims and corresponding payments on settlement for household policies is taken from the
business of an insurance company.
Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
QUESTION 17.
x
Explain how to transform the relationship Y ab to a linear form.
QUESTION 18.
QUESTION 19.
Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
The sample of ten claims and payments above (in units of £100) has the following summations:
2
x 35.4, x 133.76, y 32.87, y 115.2025, xy 123.81
2
Calculate the fitted regression line and the estimated error variance.
QUESTION 20.
For the claims settlement question above, calculate the expected payment on settlement for a claim of £350.
QUESTION 21.
Determine the split of total variation in the claims and payments model between the residual sum of
squares and the regression sum of squares.
Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Recall that: n = 10, x 35.4, y 32.87, Sxx 8.444, Syy 7.1588, Sxy 7.4502
QUESTION 22.
Calculate the coefficient of determination for the claims and payments model and comment on it.
Recall that: SSTOT 7.1588, SSREG 6.5734, SSRES 0.5854
QUESTION 23.
Calculate the correlation coefficient for the claims and payment data by using the coefficient of determina-
tion from the previous question.
QUESTION 24.
For the claims/settlements data:
(a) calculate a two-sided 95% confidence interval for , the slope of the true regression line
Recall that: Sxx 8.444, Syy 7.1588, Sxy 7.4502, ˆ 0.164, ˆ 0.88231, ˆ 2 0.0732
QUESTION 25.
For the data set of 10 claims and their settlement payments, we had:
SSTOT 7.1588, SSREG 6.5734, SSRES 0.5854
Construct the ANOVA table and carry out an F test to assess whether 0 .
QUESTION 26.
Consider again the claims/settlements data.
Calculate:
(a) a 95% confidence interval for the expected payments on claims of £460.
(b) a 95% confidence interval for the predicted actual payments on claims of £460
QUESTION 27.
Claim x 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
Payment y 2.18 2.06 2.54 2.61 3.67 3.25 4.02 3.71 4.38 4.45
Calculate the residuals for the fitted regression model ŷ = 0.164 + 0.8823x.
QUESTION 28.
A senior actuary wants to analyse the salaries of the 50 actuarial students employed by her company, using
a linear model based on number of exam passes and years of experience. Express this model and the avail-
able data.
QUESTION 29.
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights of unborn
babies. The table below shows the estimated weights for one particular baby at fortnightly intervals during
the pregnancy.
Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5
2
x 210 x y 15.3 y xy 549.8
2
7,420 42.03
2
(c) ̂ = 0.0234.
(ii) Calculate the baby's expected weight at 42 weeks (assuming it hasn't been born by then).
(iii) (a) Calculate the residual sum of squares and the regression sum of squares for these data.
2
(b) Calculate the coefficient of determination, R , and comment on its value.
(v) Construct an ANOVA table for the sum of squares from part (iii)(a) and carry out an F-test stating the
conclusion clearly.
(c) Hence, calculate a 90% confidence interval for the mean weight of a baby at 33 weeks.
(c) Hence, calculate a 90% confidence interval for the weight of an individual baby at 33 weeks.
(c) Comment on the fit of the model using the plot the residuals against the x values.
QUESTION 30.
An analysis using the simple linear regression model based on 19 data points gave:
sxx 12.2 syy 10.6 sxy 8.1
(iii) Comment on the results of the tests in parts (i) and (ii).
QUESTION 31.
The sums of the squares of the errors in a regression analysis are found to be:
QUESTION 32.
Explain how to transform the following models to linear form:
2
(i) y i a bx i ei
bx i
(ii) y i ae
QUESTION 33.
A university wishes to analyse the performance of its students on a particular degree course. It records the
scores obtained by a sample of 12 students at entry to the course, and the scores obtained in their final ex-
aminations by the same students. The results are as follows:
Student A B C D E F G H I J K L
2
x 836 y 867 x y x x y y 1,122
2
60,016 63,603
(iii) Test whether the data are positively correlated by considering the slope parameter.
(iv) Calculate a 95% confidence interval for the mean finals paper score corresponding to an individual
entrance score of 53.
QUESTION 34.
The share price, in pence, of a certain company is monitored over an 8-year period. The results are shown
in the table below:
Time (years) 0 1 2 3 4 5 6 7 8
Price 100 131 183 247 330 454 601 819 1,095
xi x yi y xi x y i y 7,087
2 2
60 925,262
An actuary fits the following simple linear regression model to the data:
y i x i ei i 0,1,...,8
where ei are independent normal random variables with mean zero and variance .
2
(i) Determine the fitted regression line in which the price is modelled as the response and the time as an
explanatory variable.
2
(c) , the true underlying error variance.
(iii) (a) State the 'total sum of squares' and calculate its partition into the 'regression sum of squares' and
the 'residual sum of squares'.
(b) Calculate the 'proportion of variability explained by the model' using the values in part (iii)(a) to
(iv) The actuary decides to check the fit of the model by calculating the residuals.
Time (years) 0 1 2 3 4 5 6 7 8
(c) Comment on the appropriateness of the linear model by referring to the plot of the residuals
against time.
QUESTION 35.
A schoolteacher is investigating the claim that class size does not affect GCSE results. His observations of
nine GCSE classes are as follows:
Class X1 X2 X3 X4 Y1 Y2 Y3 Y4 Y5
Average GCSE point score for class (p) 5.9 4.1 2.4 1.7 6.3 5.3 3.5 2.6 1.6
2
c 238 c p 33.4 p cp 983
2
6,884 149.62
Class X5 was not included in the results above and contains 15 students.
(ii) (a) Calculate an estimate of the average GCSE point score for this individual class
(b) Calculate the standard error for the estimate in part (ii)(a) assuming the full normal model.
QUESTION 36.
An actuary is fitting the following linear regression model through the origin:
Yi x i ei
ei ~ N 0,
2
i 1,2,...n
ˆ
x i Yi
xi
2
(ii) Derive the bias and mean square error of ̂ under this model.
QUESTION 37.
A life assurance company is examining the force of mortality, x , of a particular group of policyholders. It is
thought that it is related to the age, x, of the policyholders by the formula:
x
x Bc
Yi x i i where i ~ N 0,
2
are independently distributed
The summary results for eight ages were as follows:
Age, x 30 32 34 36 38 40 42 44
In x
2
x i 296 x i 11,120 In x x iIn x
2
i
–57.129 i
408.50 i
2,104.5
x
(i) (a) Apply a transformation to the original formula, x Bc , to make it suitable for analysis by linear
regression.
(b) Write down expressions for Y, and in terms of x , B and c using the transformation given in
part (i)(a).
(ii) Comment on the suitability of the regression model and state how this supports the transformation in
part (i)(a).
(iii) Use the data to calculate least squares estimates of B and c in the original formula.
Age, x 30 32 34 36 38 40 42 44
(v) (a) Calculate a 95% confidence interval for the mean predicted response In 35 .
(b) Hence obtain a 95% confidence interval for the mean predicted value of 35 .
QUESTION 38.
The government of a country suffering from hyperinflation has sponsored an economist to monitor the
price of a ‘basket’ of items in the population's staple diet over a one-year period. As part of his study, the
economist selected six days during the year and on each of these days visited a single nightclub, where he
recorded the price of a pint of lager. His report showed the following prices:
Price ( Pi ) 15 17 22 51 88 95
InPi
2
i 475 i InPi 21.5953 iInPi 1,947.020
2
54,403 81.1584
The economist believes that the price of a pint of lager in a given bar on day i can be modelled by:
InPi a bi e i
where a and b are constants and the ei 's are uncorrelated N 0, 2
random variables.
2
(i) Estimate the values of a, b and .
CA PRAVEEN PATWARI 72 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
(iv) Determine a 95% confidence interval for the average price of a pint of lager on day 365:
QUESTION 39.
(i) Show that the maximum likelihood estimates (MLEs) of and in the simple linear regression model
2
(ii) Show that the MLE of has a different denominator from the least squares estimate
QUESTION 40.
The effectiveness of a tablet containing x 1 mg of drug 1 and x 2 mg of drug 2 is being tested. In trials the
% effectiveness, y x1 x2
y 1 x1 2 x 2 e
y i n 1 x i1 2 x i2
y i x i1 x i1 1 x i1 2 x i2x i1
2
y i x i2 x i2 1 x i1 x i2 2 x i2
2
(b) Hence, using the above data, show that the fitted model is:
(ii) Comment on the significance of the parameters by considering the following output from R for this
model.
Coefficients:
Estimate Std. Error t value Pr( >| t | )
(Intercept) 25.308441 2.002062 12.64 0.006200 **
drug_1 1.193671 0.036592 32.62 0.000938 ***
drug_2 0.301468 0.007048 42.77 0.000546 ***
2
The coefficient of determination for the fitted model is R 0.9992 .
2
(iii) Calculate the adjusted R .
The ANOVA table for the model is
Regression 2 49.1137 *
Residual 2 0.0383 *
Total 4 49.152
(iv) Calculate the missing values, the F statistic and then carry out the F test .stating the conclusion clearly.
(v) Calculate the percentage effectiveness for a tablet containing 51.3 mg of drug x 1 and 18.3 mg of drug x 2 .
The plot of the residuals against the fitted values and the Q-Q plot of the residuals are given below.
(vi) Comment on the fit of the model, making reference to the plots given above.
It is thought that the two drugs might have an interactive effect.
(vii) (a) Explain what this means.
(b) Write down the formula for the regression model that has the two drugs as main effects and also
their interaction.
2
The model in part (vii)(b) has an adjusted R of 0.9969.
(c) Comment on whether the new model is an improvement.
CHAPTER
35 20 0
37 20 1
45 30 0
55 30 5
Show that it is impossible to individually estimate all the parameters in the linear predictor.
QUESTION 42.
In UK motor insurance business, vehicle-rating group is also used as a factor. Vehicles are divided into
twenty categories numbered 1 to 20, with group 20 including those vehicles that are most expensive to re-
pair.
Suppose that we have a three-factor model specified as age*(sex + vehicle group). Determine the linear
predictor for a model of this type.
QUESTION 43.
Claim amounts for medical insurance claims for hamsters are believed to have an exponential distribution
with mean i :
1 y / y
f yi e i
exp i log i
i
i i
We have the following data for hamsters’ medical claims, using the model above:
age x i (months) 4 8 10 11 17
The insurer believes that a linear function of age affects the claim amount:
i x i
Using the canonical link function, write down (but do not try to solve) the equations satisfied by the maxi-
mum likelihood estimates for and , based on the above data.
QUESTION 44.
Claim amounts for medical insurance claims for hamsters are believed to have an exponential distribution
with mean i :
1 y / y
f yi e i
exp i log i
i
i i
We are given the following data for hamsters' medical claims, using the model above:
age x i (months) 4 8 10 11 17
The insurer believes that a model with 5 categories for age is sufficiently accurate:
i i i 1,2,3,4,5
Using the canonical link function, show that the fitted values ̂i are the observed claim amounts, y i .
QUESTION 45.
Explain the difference between the two types of covariate: a variable and a factor
QUESTION 46.
y b
f y exp c y,
a
(i) State the mean and variance of Y in terms of b and its derivatives and a .
(ii) (a) Show that an exponentially distributed random variable with mean has a density that can be
written in the above form.
(b) Determine the natural parameter and the variance function.
QUESTION 47.
An insurer wishes to use a generalised linear model to analyse the claim numbers on its motor portfolio. It
has collected the following on claim numbers y i , i 1,2,...,35 from three different classes of policy:
Class I 1 2 0 2 1 0 0 2 2 1
Class II 1 0 1 1 0
Class III 0 0 0 0 0 1 0 1 0 0
1 0 1 0 0 0 0 0 0 0
i 1, 2,..., 10
log i i 11, 12,..., 15
i 16, 17,..., 35
(ii) Derive the likelihood function for this model, and hence find the maximum likelihood estimates for
, and .
The insurer now analyses the simpler model log i , for all policies.
(iii) Calculate the maximum likelihood estimate for under this model (Model B).
(iv) (a) Show that the scaled deviance for Model A is 24.93.
(v) Compare Model A directly with Model B, by calculating an appropriate test statistic.
QUESTION 48.
In the context of generalised linear models, consider the exponential distribution with density function f(x),
where:
1 x/
f x e x 0
(i) Show that f(x) can be written in the form of the exponential family of distributions.
1
(ii) Show that the canonical link function, , is given by .
QUESTION 49.
The random variable Z i has a binomial distribution with parameters n and i , where 0 i 1 .
(i) Show that the distribution of Yi is a member of the exponential family, stating clearly the natural and
QUESTION 50.
A statistical distribution is said to be a member of the exponential family if its probability function or prob-
ability density function can be expressed in the form:
y b
fY y; , exp c y,
a
(i) Show that the mean of such a distribution is b' and derive the corresponding formula for the va-
riance by differentiating the following expression with respect to :
f y dy 1
y
(ii) Use this method to determine formulae for the mean and variance of the gamma distribution with
density function:
f x x 0
1 x/
x e
QUESTION 51.
Independent claim amounts Y1 , Y2 ,..., Yn are modelled as exponential random variables with E Yi i ,
i 1,2,...,n . The fitted values for a particular model are denoted by ̂ i .
QUESTION 52.
A small insurer wishes to model its claim costs for motor insurance using a simple generalised linear model
based on the three factors:
i 1 for 'young ' drivers
YOi
i 0 for 'old' drivers
The insurer is considering three possible models for the linear predictor:
Model 1: YO+FS + TC
Model 2: YO+FS + YO.FS + TC
Model 3: YO*FS*TC
(i) Write each of these models in parameterised form, stating how many non-zero parameter values are
present in each model.
(ii) Explain why Model 1 might not be appropriate and why the insurer may wish to avoid using Model 3.
The student fitting the models has said 'We are assuming a normal error structure and we are using
the canonical link function.'
(iii) Explain what this means.
The table below shows the student's calculated values of the scaled deviance for these three models
and the constant model.
1 50 7
YO + FS + TC 10
YO + FS+YO.FS + TC 5
YO* FS* TC 0
(iv) (a) Complete the table by filling in the missing entries in the degrees of freedom column.
(b) Carry out the calculations necessary to determine which model would be the most appropriate.
QUESTION 53.
The following study was carried out into the mortality of leukaemia sufferers. A white blood cell count was
taken from each of 17 patients and their survival times were recorded.
Suppose that Yi represents the survival time (in weeks) of the ith patient and x i represents the logarithm
(to the base 10) of the ith patient's initial white blood cell count (i = 1,2,...,17).
The response variables Yi are assumed to be exponentially distributed. A possible specification
for E Yi is E Yi exp x i . This will ensure that E Yi is non-negative for all values of x i .
(i) Write down the natural link function associated with the linear predictor i x i .
(ii) Use this link function and linear predictor to derive the equations that must be solved in order to ob-
tain the maximum likelihood estimates of and .
The maximum likelihood estimate of derived from the experimental data is ˆ 8.477 , with esti-
mated standard error 1.655.
(iii) Construct an approximate 95% confidence interval for and interpret this result.
The following two models are now to be compared:
Model 1: E Yi
Model 2: E Yi x i
The scaled deviance for Model 1 is found to be 26.282 and the scaled deviance for Model 2 is 19.457.
(iv) Test the null hypothesis that 0 against the alternative hypothesis that 0 stating any conclusions
clearly.
ASSIGNMENT 5
CHAPTER
BAYESIAN STATISTICS
QUESTION 1.
Three manufacturers supply clothing to a retailer. 60% of the stock comes from Manufacturer 1, 30% from
Manufacturer 2 and 10% from Manufacturer 3. 10% of the clothing from Manufacturer 1 is faulty, 5% from
Manufacturer 2 is faulty and 15% from Manufacturer 3 is faulty.
QUESTION 2.
The annual number of claims arising from a particular group of policies follows a Poisson distribution with
mean . The prior distribution of is exponential with mean 30.
In the previous two years, the numbers of claims arising from the group were 28 and 26, respectively.
QUESTION 3.
Suppose that X 1 , X 2 ,..., X n is a random sample from a Type 1 geometric distribution with parameter p ,
Determine a family of distributions for p that would result in conjugate prior and posterior distributions.
QUESTION 4.
The number of claims received per week from a certain portfolio has a Poisson distribution with mean .
The prior distribution of is as follows:
1 2 3
Given that 3 claims were received last week, determine the posterior distribution of .
QUESTION 5.
A random sample of size 10 from a Poisson distribution with mean yields the following data values:
3, 4, 3, 1, 5, 5, 2, 3, 3, 2
QUESTION 6.
A random sample of size 15 from a normal distribution with mean and standard deviation 3 yields the
following data values:
10.75 –0.29 5.37 6.68 8.77 1.69 7.12 4.89 6.45 4.27 9.37 5.68 3.87 7.70 6.98
Calculate an equal-tailed 95% Bayesian credible interval for based on these data values. You are given
that the posterior distribution of is N 5.83,0.722 .
2
CA PRAVEEN PATWARI 84 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
QUESTION 7.
The punctuality of trains has been investigated by considering a number of train journeys. In the sample,
60% of trains had a destination of Manchester, 20% Edinburgh and 20% Birmingham. The probabilities of
a train arriving late in Manchester, Edinburgh or Birmingham are 30%, 20% and 25%, respectively.
QUESTION 8.
A random variable X has a Poisson distribution with mean , which is initially assumed to have a chi-
squared distribution with 4 degrees of freedom.
Determine the posterior distribution of after observing a single value x of the random variable X.
QUESTION 9.
The number of claims in a week arising from a certain group of insurance policies has a Poisson distribu-
tion with mean . Seven claims were incurred in the last week.
QUESTION 10.
For the estimation of a population proportion p, a sample of n is taken and yields x successes. A suitable
prior distribution for p is beta with parameters 4 and 4.
(i) Show that the posterior distribution of p given x is beta and specify its parameters.
11 successes are observed in a sample of size 25.
(ii) Calculate the Bayesian estimate under all-or-nothing (0/1) loss.
QUESTION 11.
The annual number of claims from a particular risk has a Poisson distribution with mean . The prior dis-
tribution for has a gamma distribution with 2 and 5 .
Claim numbers x 1 ,...x n over the last n years have been recorded.
(i) Show that the posterior distribution is gamma and determine its parameters.
8
Now suppose that n 8 and x i 5
i 1
QUESTION 12.
A single observation, x, is drawn from a distribution with the probability density function:
1
0 x
f x |
0
otherwise
f exp , 0
Derive an expression in terms of x for the Bayesian estimate of under absolute error loss.
QUESTION 13.
A proportion p of packets of a rather dull breakfast cereal contain an exciting toy (independently from
packet to packet). An actuary has been persuaded by his children to begin buying packets of this cereal. His
prior beliefs about p before opening any packets are given by a uniform distribution on the interval [0,1]. It
turns out the first toy is found in the n1 th packet of cereal.
(i) Determine the posterior distribution of p after the first toy is found.
A further toy was found after opening another n2 packets, another toy after opening another n3 packets
and so on until the fifth toy was found after opening a grand total of n1 n 2 n3 n 4 n5 packets.
(ii) Determine the posterior distribution of p after the fifth toy is found.
(iii) Show the Bayes' estimate of p under quadratic loss is not the same as the maximum likelihood esti-
mate and comment on this result.
QUESTION 14.
An actuary has a tendency to be late for work. If he gets up late then he arrives at work X minutes late
where X is exponentially distributed with mean 15. If he gets up on time then he arrives at work Y minutes
late where Y is uniformly distributed on [0,25]. The office manager believes that the actuary gets up late
one third of the time.
Calculate the posterior probability that the actuary did in fact get up late given that he arrives more than 20
minutes late at work.
CHAPTER
CREDIBILITY THEORY
QUESTION 15.
A specialist insurer that provides insurance against breakdown of photocopying equipment calculates its
premiums using a credibility formula. Based on the company’s recent experience of all models of copiers,
the premium for this year should be £100 per machine. The company's experience for a new model of copi-
er, which is considered to be more reliable, indicates that the premium should be £60 per machine.
Given that the credibility factor is 0.75, calculate the premium that should be charged for insuring the new
model.
QUESTION 16.
An insurer is setting the premium rate for the buildings in an industrial estate. Past experience for the es-
tate indicates that a premium rate of £3 per £1,000 sum insured should be charged. The past experience of
other similar estates for which the insurer provides cover indicates a premium rate of £5 per £1,000 sum
insured. The insurer uses a credibility factor of 75% for this risk.
QUESTION 17.
Claim amounts on a portfolio of insurance policies have an unknown mean . Prior beliefs about are de-
2
scribed by a distribution with mean 0 and variance 0 . Data are collected from n claims with mean claim
2
amount x and variance s . A credibility estimate of is to be made, of the form:
Zx 1 Z 0
2
n0
B. 2
n0 n
2
0
C. 2
n 0
QUESTION 18.
The total claim amount per annum on a particular insurance policy follows a normal distribution with un-
2
known mean and variance 200 . Prior beliefs about are described by a normal distribution with mean
2
600 and variance 50 . Claim amounts x1 , x 2 ,..., x n are observed over n years.
(ii) Show that the mean of the posterior distribution of can be written in the form of a credibility esti-
mate.
Now suppose that n = 5 and that total claims over the five years were 3,400.
QUESTION 19.
A statistician wishes to obtain a Bayesian estimate of the mean of an exponential distribution with density
1 x/
function f x e . He is proposing to use a prior distribution with PDF:
/
e
f , 0
1
(i) Write down the likelihood function for , based on observations x1,...,xn from an exponential distribu-
tion.
(b) Hence show that an expression for the Bayesian estimate for under squared error loss is:
xi
ˆ
n 1
(iii) Show that the Bayesian estimate for can be written in the form of a credibility estimate, giving a
formula for the credibility factor.
The statistician decides to use a prior distribution of this form with parameters 40 and 1.5 .
You are given the following summary statistics from the sample data:
(iv) Calculate the Bayesian estimate of and the value of the credibility factor.
QUESTION 20.
Let denote the proportion of insurance policies in a certain portfolio on which a claim is made. Prior be-
liefs about are described by a beta distribution with parameters and .
2
Underwriters are able to estimate the mean and variance of .
A random sample of n policies is taken and it is observed that claims had arisen on d of them.
(b) Show that the mean of the posterior distribution can be written in the form of a credibility esti-
mate.
CHAPTER
EMPIRICAL BAYES
CREDIBILITY THEORY
QUESTION 21.
The table below shows the aggregate claim amounts (in £m) for an international insurer's fire portfolio for
a 5-year period, together with some summary statistics.
1 5
2
1 2 3 4 5 xi x ij xi
4 j 1
1 48 53 42 50 59 50.4 39.3
2 64 71 64 73 70 68.4 17.3
Country, i
3 85 54 76 65 90 74.0 215.5
4 44 52 69 55 71 ? ?
(i) Fill in the missing entries in the last row of the table.
(ii) Estimate the values of E m , E s and var m using EBCT Model 1, and hence estimate
2
the credibility factor, Z.
(iii) Calculate the credibility premium for each country using EBCT Model 1.
QUESTION 22.
The figures given in the table below are the aggregate claims (in £000s) for each of four risks over a period
of four years.
(i) Assuming that the data satisfy the assumptions of EBCT Model 1, estimate the aggregate claim amount
for Risk 1 in Year 5.
(ii) Use EBCT Model 2 to estimate the aggregate claim amount for Risk 1 in Year 5. The corresponding fig-
ure for Risk 2 is 3,050.8.
QUESTION 23.
Consider the following statements made about EBCT Model 1.
(c) None of the random variables or parameters in the model are assumed to have a normal distribution.
QUESTION 24.
The table below shows the aggregate claim amounts (in £m) for an international insurer's fire portfolio for
a 5-year period, together with some summary statistics.
1 2 3 4 5
1 48 53 42 50 59
2 64 71 64 73 70
Country, i
3 85 54 76 65 90
4 44 52 69 55 71
The volumes of business for each country for the insurer are as follows
1 2 3 4 5 6
1 12 15 13 16 10 20
2 20 14 22 15 30 25
Country, i
3 5 8 6 12 4 10
4 22 35 30 16 10 12
Calculate the credibility premium for each country in Year 6 using EBCT Model 2
QUESTION 25.
An actuary has, for three years, recorded the volume of unsolicited advertising that he receives. He believes
that the number of items that he receives follows a Poisson distribution with a mean which varies accord-
ing to which quarter of the year it is. He has recorded Yij the number of items received in the i th quarter of
the j th year (i = 1,2,3,4 and j = 1,2,3). The actuary wishes to estimate the number of items that he will re-
ceive in the first quarter of year four. He has recorded the following data:
Yij Yi
2
1
Yi1 Yi2 Yi3 Yi Yij
3 j j
i=3 75 83 88 82 86
(i) Estimate Y1,4 the number of items that the actuary expects to receive in the first quarter of year four
using the assumptions of EBCT Model 1. The actuary believes that, in fact, the volume of items has
been increasing at the rate of 10% per annum.
(ii) Suggest how the approach in (i) can be adjusted to produce a revised estimate taking this growth into
account.
(iii) Calculate the maximum likelihood estimate of Y1,4 (based on the quarter one data already observed
and the 10% pa increase described above).
(iv) Compare the assumptions underlying the approach in (i) and (ii) with those underlying the approach
in (iii).
QUESTION 26.
An actuary wishes to analyse the amounts paid by a group of insurers on their respective portfolios of
commercial property insurance policies using the models of Empirical Bayes Credibility Theory.
The actuary obtains the following information about the amounts of claim payments made and the number
of policies sold for each of three different insurers. The data obtained are as follows
(i) Calculate the expected total claim payment to be made by Insurer B in the coming year under EBCT
Model 1.
(ii) Calculate the expected payout amount for Insurer B in the coming year using EBCT Model 2, assuming
that the expected number of policies sold for the coming year for Insurer B is 4,800.
You may use the summary statistics given below, which have been calculated using the formulae and
notation given in the Tables, again working in millions of pounds. Subscripts 1, 2 and 3 refer to Insur-
ers A, B and C respectively.
QUESTION 27.
An actuarial student is using Empirical Bayes Credibility Theory Model 2 to calculate credibility premiums
for a group of insurers. The student has analysed the data for six different insurers, using 10 years of past
data for each insurer and has obtained the following figures:
6 10
Pij 1, 498
*
P 18.24
i 1 j1
The estimated values of E m , E s and var m based on the data from the six insurers are
2
4.00, 62.8 and 42.1, respectively.
The student has just received the following information relating to a seventh insurer (Insurer I), and he
wishes to update the estimates of E m , E s and var m using the claims data for Insurer I
2
given in the below:
Year, j 1 2 3 4 5 6 7 8 9 10
Aggregate
Claim amount, 100 85 90 102 109 106 128 132 150 131
y ij
Risk volume,
22 24 26 20 25 30 29 35 40 36
Pij
(b) Hence calculate the credibility premium for Insurer I for the coming year, given that Insurer I is
expected to have a risk volume figure for the coming year of 38.
The student also needs a credibility estimate for Insurer K, one of the six insurers included in the orig-
inal analysis. He knows that, for Insurer K:
10 10
y Kj 986 and PKj 327
j 1 j 1
(ii) Explain whether the credibility premium for Insurer K (based on the full analysis of the seven insur-
ers) will be greater or less than the corresponding figure for Insurer I (per unit of risk volume).
BASICS ASSIGNMENT
SOLUTIONS
SUMMARISING DATA
ANSWER 1. ANSWER 5.
0.35 ANSWER 6.
595
ANSWER 3.
(iii) £23,800
ANSWER 8.
(iv) £42,000
– 2.92
ANSWER 4. Negatively skewed
76.5 years
ANSWER 9.
ANSWER 12.
ANSWER 10.
(i) 73,184
(a) – 119.498.484
(ii) 18.57, 4.70
(b) – 0.15064
(iii) 74.66 years, 8.29 years
ANSWER 11.
ANSWER 13.
Mean = 0.35
8.8
Standard Deviation = 0.67232
BASIC PROBABILITY
ANSWER 14. ANSWER 19.
1/3 (i) 0.0758
ANSWER 21.
ANSWER 17.
(i) 21/20 0.06
(ii) 11/50
ANSWER 22.
(iii) 17/50
11/15
ANSWER 18.
(i) 0.01008
(ii) 0.992
RANDOM VARIABLE
ANSWER 23. ANSWER 27.
Probability function, P(X=x), is: Expectation = 6.3
P(X = 0) = 0.25 Standard Deviation = 1.62
P(X = 1) = 0.5
ANSWER 28.
P(X = 2) = 0.25
(i) 16.3
ANSWER 24. (ii) 17.5
1/30 (iii) 0.285
PROBABILITY DISTRIBUTION
ANSWER 32. ANSWER 33.
(ii) 1/256
ANSWER 34.
(iii) 247/256 0.1114
CA PRAVEEN PATWARI 99 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
(ii) 2.28%
ANSWER 41.
(iii) 49.72%
0.128
ANSWER 37.
ANSWER 42.
(a) 0.0454
(i) 0.94887
(b) 0.0155
(ii) 0.12604
ASSIGNMENT 1
SOLUTIONS
GENERATING FUNCTIONS
ANSWER 1. ANSWER 7.
1 (i) e M X 2t
3t
ANSWER 2. (ii)
N 2 3,4
2
distribution.
1 t 1 t 1 t
e 2e 2e
t 2t 2t ANSWER 8.
(i) E(Y) = 8
ANSWER 3.
(ii) Sd Dev = 5.6569
1/5, 1/25
(iii) 20643840
ANSWER 4.
ANSWER 9.
Proof t
pe
(i) MU t t
ANSWER 5. 1 qe
0.09158 qe
t
(ii) CGF 1 t
1 qe
ANSWER 6.
t ANSWER 10.
(i) e
t 2 t R
Ke
(i) (a)
(ii) E X
1
2 t
(b) t<2
1
(iii) V X 2 2R
(ii) k 2e
JOINT DISTRIBUTION
ANSWER 13. ANSWER 19.
(i) 9/35 0.02
(iii) –1/450
CONDITIONAL EXPECTATION
ANSWER 34. ANSWER 38.
18 56
ASSIGNMENT 2
SOLUTIONS
THE CENTRAL LIMIT THEOREM
ANSWER 1. ANSWER 7.
(b) 0.0433
ANSWER 2.
0.71634 ANSWER 8.
0.06178
ANSWER 3.
0.15721 ANSWER 9.
0.91356
ANSWER 4.
ANSWER 5.
ANSWER 12.
0.840
0.06681
ANSWER 6.
ANSWER 13.
0.43794
0.85282
Var = 8522.7
ANSWER 21.
ANSWER 15. (i) 0.995
(iv) 10%
ANSWER 16.
(i) 0.025 ANSWER 22.
(ii) 0.99 (i) 0.0127
(iv) 0.998
ANSWER 17.
(v) 0.0114
alpha = 2.3
beta = 0.345
ANSWER 18.
0.28638
ANSWER 19.
(a) 4.765
(b) 0.2711
ASSIGNMENT 3
SOLUTIONS
POINT ESTIMATION
ANSWER 1. ANSWER 8.
0.1940 3.057
ANSWER 2. ANSWER 9.
5.67 2n 1 4
(i) 2
n
ANSWER 3.
(ii) Estimator is consistent
p = 0.2621
(i) PROOF
ANSWER 6.
(ii) PROOF
0.0443
(iii) 0.624
ANSWER 7.
(iv) 54%
0.102
ANSWER 13.
(i) 1
2
(ii)
1
ANSWER 14.
(i) PROOF
n n
(ii) FX MAX
x F x 1 1 x
(iii) PROOF
(iv) 0.01259
1 771 log771
(v) log771 24
0
1 771
This equation cannot be solved algebraically. A numerical method will be needed to solve it.
(vi) We cannot use the usual method of moments approach unless we know all the individual sample val-
ues (or at least the mean of the sample). So we do not have sufficient information to use the method of
moments approach here.
ANSWER 15.
1 1
(i) Since 0 P X x 1 , using this for each of the probabilities gives lower bounds for of , and
16 6
3 1 7 1 5
. Hence, . We also obtain upper bounds for or of of , and .
8 16 16 6 8
1
Hence .
6
(ii) 0.0083
(iii) PROOF
(iv) 0.0929
ANSWER 16.
X = 1, 11678
X = 2, 779
X = 3, 35
X = 4, 1
X = 5, 0
X > = 6, 0
(ii) PROOF
X = 1, 10945
X = 2, 1048
X = 3, 90
X = 4, 7
X = 5, 1
X> = 6, 0
(iv) For a Poisson distribution, the mean and variance are the same. Since the sample mean and variance
(which, for a sample as large as this, should be very close to the true values) are 0.1334S and 0.14304,
which differ significantly, this suggests that the Poisson distribution may not be a suitable model here.
The negative binomial distribution has more flexibility and can accommodate different values for the
mean and variance (provided the variance exceeds the mean)
CONFIDENCE INTERVAL
ANSWER 17. ANSWER 22.
(126.5, 137.5) (0.140, 0.220)
(ii) (0,10.0)
ANSWER 25.
ANSWER 20. (0.0908, 0.2203)
(116.7, 136.5)
ANSWER 21.
(0.00127, 0.249)
ANSWER 26.
(b) We have assumed that the numbers of hours that actuarial students spend watching television
has a normal distribution.
(iii) (a) For large samples, the confidence interval for the mean will eventually converge on the sample
mean which should be equal to the true mean, whereas the prediction interval will not converge
to a single value but down to an interval of the distribution.
(b) Unlike confidence intervals for the mean, which is concerned with the centre of the distribution,
prediction intervals also take account of the tails as well as the centre. Hence, prediction inter-
vals have greater sensitivity to the assumption of normality.
ANSWER 27.
(0.4735, 0.9968)
ANSWER 28.
(i) (a) B appears to have a slightly smaller spread (but it is hard to tell with so few data points).
The difference in the spread doesn't appear to be significant, so the assumption of equal va-
riances can be allowed to stand.
Since this interval contains zero there is insufficient evidence to suggest that A and B give different
valuations.
ANSWER 29.
(i) (a) 0.11494
(b) 0.000661
(b) This confidence interval is narrower as it is based upon the exact result, whereas in part (i) (c) it
was based on a relatively small sample of 20. A larger sample would have given a narrower in
terval.
ANSWER 30.
(i) (a) 14
(b) 11
(ii) (a) 13
(b) 4
(iii) Comment
For the confidence intervals the sample sizes are: similar, but larger in the case where less informa-
tion is known. In general, prediction intervals are wider than confidence intervals and so a larger
sample is needed to get the same width. However, in this case, the prediction intervals vary due to the
vast difference in the talls of the t distribution.
ANSWER 31.
(7.632, 7.744)
ANSWER 32.
(b) Since the confidence interval contains the value 0, there is insufficient evidence to conclude that
the new screening programme significantly reduces the mean claim amount.
(b) Since the confidence interval contains 1, this means that we are reasonably confident that the
population variances are the same.
(iii) 16
HYPOTHESIS TESTING
ANSWER 33. ANSWER 34.
ANSWER 35.
(i) 59.9
(ii) 0.864
ANSWER 36.
The average IQ of university students is greater than 100
ANSWER 37.
The long term average annual rainfall has increased from its former level.
ANSWER 38.
the proportion of male carriers in the population is less than 10%.
ANSWER 39.
The true claim frequency is less than 0.175.
ANSWER 40.
The patients on the special diet have the same blood pressure as patients on the normal die
ANSWER 41.
(i) the mean performance is greater with the additive than without
ANSWER 42.
There is no difference in the variances of the two populations.
ANSWER 43.
the proportion of claims due to burglaries in the year just ended is not greater than the proportion in the
previous year.
ANSWER 44.
the training course does not increase employees’ efficiency.
ANSWER 45.
We have no evidence that the die is not fair.
ANSWER 46.
there has been no change in the pattern of causes of death.
ANSWER 47.
A Poisson model does not provide a good model for the number of claims.
ANSWER 48.
The underlying distribution is binomial.
ANSWER 49.
(a) No differences among the population proportions have been detected.
ANSWER 50.
The level of injury is almost certainly dependent on whether the victim is wearing a seatbelt.
ANSWER 51.
The characteristic is not dependent on the mother’s age.
ANSWER 52.
(i) Sufficient evidence at the 10%level to reject H0
ANSWER 53.
(i) (a) It is reasonable to conclude that =15.
2
(ii) It is reasonable to conclude that = 20
ANSWER 54.
It is reasonable to conclude that p = 2/3
ANSWER 55.
(i) (a) Sensitivity = 96.1%
(b) Specificity = 90.6%
(ii) (a) 288
(b) 256
ANSWER 56.
10%
ANSWER 57.
2 2
(ii) We conclude that 1 2
ANSWER 58.
(i) We conclude that H C
ANSWER 59.
(i) (a) 2.585
(b) conclude that the model is a good fit
(ii) We conclude that 3, ie the patient does have anaemia.
ANSWER 60.
The classification into the three AIDS statuses is not independent of the presence or absence of the alleles.
ANSWER 61.
(i) There is an association between single parent families and being in trouble with the police.
(ii) However, the presence of an association does not justify the politician's assumption that single par-
ents cause crime. There may be some other underlying causes (eg education levels, poverty) that in-
fluence family circumstances and crime rates together.
CA PRAVEEN PATWARI 114 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
ANSWER 62.
(i) (a) Each house independently must have the same probability of being burgled
(b) 91/600
(d) These are very similar to the observed frequencies, which implies that the model is a good fit.
ANSWER 63.
(i) There is perhaps some very slight evidence of concentration at the centre of the distribution for A, but
the sample sizes are small and it is difficult to tell whether an assumption of normality is reasonable.
The variance of the data from Company B looks slightly smaller than that from Company A. However,
it is unlikely that such a small difference is significant. There are no outliers in either distribution.
2 2
(ii) Reasonable to conclude that A B
(iii) The level of premiums charged by Company B is the same as that charged by Company A
(b) Since this confidence interval contains zero, we cannot conclude that the proportions of pre-
miums in excess of 200 are different for the two companies.
(v) The company has increased its premiums since the previous year.
ASSIGNMENT 4
SOLUTIONS
CORRELATION
ANSWER 1.
Here we can see that there appears to be a strong positive linear relationship. The plotted data points lie
ANSWER 2.
In can clearly be seen that the data displays a non-linear relationship, since the rate of change in the inter-
est rate increases with the leverage ratio.
ANSWER 3.
Proof
ANSWER 4.
r = 0.95824
As expected, this is high (close to +1), and indicates a strong positive linear relationship.
ANSWER 5.
r = 0.87108
ANSWER 6.
rs = 0.9636
As expected, the Spearman’s rank correlation coefficient is very high, since it is known from the calculation
of the Pearson’s correlation coefficient that there is a strong positive linear relationship (hence a strong
monotonically increasing relationship).
ANSWER 7.
For the corporate borrowing data, the ranks of the two data are exactly equal, hence Spearman’s rank cor-
relation coefficient is trivially equal to 1.
The reason that this is materially higher than the equivalent Pearson coefficient is because the non-linearity
of the relationship does not feature in the calculation, only the fact that it is monotonically increasing.
ANSWER 8.
= 0.8667
The relatively high value demonstrates the strong correlation between the variables.
ANSWER 9.
For the corporate borrowing data, clearly all the pairs are concordant, and so is trivially equal to 1.
ANSWER 10.
Test statistic = 9.478
ANSWER 11.
p-value = 0.12
ANSWER 12.
Spearman’s rank correlation
ANSWER 13.
(i) Sxx 70, Syy 3.015 and Sxy 14.3
(ii) r = 0.984336
There is a strong linear association between gestation period and foetal weight.
(iii) The ranks of the two variables (gestation period and weight) are exactly equal, hence Spearman's
rank correlation coefficient is equal to 1.
This means that all the pairs are concordant, and so is also equal to 1.
ANSWER 14.
(i) (a) Pearson correlation coefficient : r = 0.81045
(ii) There is strong positive correlation between class size and GCSE results (ie bigger classes have better
GCSE results).
However, correlation does not necessarily imply causation, ie whilst bigger classes have better results,
it is not necessarily the class size that causes the improvement.
ANSWER 15.
(i) (a) Since we have tied ranks wecannot use the simplified formula for Spearman or Kendall.
(b) r = 0.85860
LINEAR REGRESSION
ANSWER 16.
There appears to be a strong positive linear relationship and so fitting a linear regression model is appropri-
ate.
CA PRAVEEN PATWARI 119 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
ANSWER 17.
If we take logs, the relationship becomes:
ANSWER 18.
yˆ ˆ ˆ x
ŷ y ˆ x ˆ x y
ANSWER 19.
ANSWER 20.
£325
ANSWER 21.
SS TOT 7.1588
SSREG 6.5734
SSRES 0.5854
ANSWER 22.
0.918
ANSWER 23.
±0.958
ANSWER 24.
(a) (0.668, 1.10)
(b) The 95% two-sided confidence interval in (a) contains the value ‘1’, so the two-sided test in (b) con-
ducted at the 5% level results in H 0 being accepted.
ANSWER 25.
p-value = 89.8
ANSWER 26.
(a) (£392, £452)
ANSWER 27.
The residuals eˆ i y i yˆ i , are given in the table below
xi 2.10 2.40 2.50 3.20 3.60 3.80 4.10 4.20 4.50 5.00
ê i 0.163 –0.221 0.171 –0.377 0.330 –0.266 0.239 –0.159 0.246 –0.125
ANSWER 28.
The basic model is:
E Y| x1 , x2 1 x1 2x2
Here x 1 represents the number of exam passes, x 2 represents the number of years' experience and Y would
represent the corresponding salary.
reflects the average salary for a new student (with no exam passes or experience)
1 and 2 reflect the changes in pay associated with an extra exam pass and an extra year's experience,
respectively.
Since the data relates to 50 (= n) students, we need to introduce an extra subscript i corresponding to the i th
student. So the actual salary for the i th student will be:
Yi 1 x i1 2 x i2 ei
where e i is the difference between the student's actual salary and the theoretical salary for someone with
the same number of exam passes and experience.
ANSWER 29.
2
(c) ˆ 0.0234 .
(ii) 3.98 kg
SSREG 2.921
SSRES 0.094
(b) 96.9%
(b) (1.99,2.30)
(b) 0.0287
(b) All values are between 3ˆ 3 0.0234 0.46 so there appear to be no outliers.
There may be possible skewness but it’s difficult to tell with such a small dataset.
(d) One of the values is way off the diagonal line which indicates that the data set may be non-
normal and hence the full normal linear regression model may not be appropriate.
ANSWER 30.
(i) (a) 0.66393
(b) 4.184
(iii) These tests are equivalent. Testing whether there is any correlation is equivalent to testing if the slope
is not zero (ie it is sloping upwards and there is positive correlation or it is sloping downwards and
there is negative correlation). So the tests give the same statistic and p-value.
ANSWER 31.
2
R 0.64
ANSWER 32.
2
(i) Let Yi y i and X i x i .
Iny i Ina bx i
CA PRAVEEN PATWARI 123 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
ANSWER 33.
(b) (13.8,64.2)
(iv) (56.2,67.2)
(b) 73.7% of the variation is explained by the model, which indicates that the fit is fairly good. It still
might be worthwhile to examine the residuals to double check that a linear model is appropriate
ANSWER 34.
ŷ 32.47 118.117x
(b) (4350,89100)
SSREG 837,093
SSRES 88,169
2
(b) R 90.5%
(c) This tells us that 90.5% of the variation in the prices is explained by the model. Since this leaves
only 9.5% from other non-model sources, it would appear that the model is a very good fit to the
data.
CA PRAVEEN PATWARI 124 JAI SHREE RAM
CS1 ASSIGNMENTS & SOLUTIONS ACTUATORS EDUCATIONAL INSTITUTE
(iv) (a) x 1 eˆ 45
x4 eˆ 110
x 8 eˆ 183
(b) Since ei ~ N 0,
2
we would expect the dotplot to be normally distributed about zero. This does
not appear to be the case, but it is difficult to tell with such a small data set.
(c) Clearly this is not patternless. The residuals are not independent of the time. This means that the
linear model is definitely missing something and is not appropriate to these data.
ANSWER 35.
(i) The fitted regression line is: p 0.16901c 0.75836
ANSWER 36.
(i)
x i Yi
xi
2
2
(ii)
Bias ˆ 0 ; MSE ˆ
xi
2
ANSWER 37.
(i) (a) In x InB x In c
(ii) The graph appears to show an approximately linear relationship and this supports the transformation
in part (i)(a). However, it does appear to have a slight curve and this would warrant closer inspection
of the model to see if it is appropriate for the data.
(b) This tells us that 95.7% of the variation in the data can be explained by the model and so indi-
cates an extremely good overall fit of the model.
(c)
Age, X 30 32 34 36 38 40 42 44
(d) The residuals should be patternless when plotted against X, however it is clear to see that some
pattern exists - this indicates that the linear model is not a good fit and that there is some other
variable at work here.
ANSWER 38.
(ii) 0.989
(b) (800,5360)
ANSWER 39.
(i) Proof
(ii) Proof
ANSWER 40.
(i) (a) y i n 1 x i1 2 x i2
y i x i1 x i1 1 x i1 2 x i2x i1
2
y i x i2 x i2 1 x i1 x i2 2 x i2
2
(ii) The p-values for all the parameters are less than 0.05 and so they are all significantly different from
zero.
(iii) 0.9984
(iv) Test statistic = 1,280
(v) 92.1%
(vi) The first plot appears to be random and there is no discernible increase in the variance - so this would
imply that the model meets these assumptions. Point 1 (92.5%) does appear to be an outlier. But it is
difficult to tell with such a small dataset.
With the exception of point (1) the rest of the values lie along the diagonal line thus implying a normal
distribution is appropriate.
(vii) (a) If there is interaction between the two drugs then there is an additional effect caused when both
are present compared with what would be expected if they were each administered singly.
2
(c) The model with just the two drugs as main effects had an adjusted R of 0.9984 in part (iii) whe-
2
reas the new model with the interactive effect has an adjusted R of 0.9969.
2
Since there is a decrease in the value of the adjusted R the previous model would be considered
the 'best' model as the interaction term does not improve the fit enough to justify the extra pa-
rameter.
Proof
ANSWER 42.
i j i x jx
ANSWER 43.
1
y i 0
ˆ ˆ x i
xi
x i y i 0
ˆ ˆ x i
ANSWER 44.
Proof
ANSWER 45.
A variable is a type of covariate (eg age) whose actual numerical value enters the linear predictor directly,
and a factor is a type of covariate (eg sex) that takes categorical values.
ANSWER 46.
1 y
(ii) (a) f y exp In
1
1 1
b' b"
2
2
ANSWER 47.
y log
(i) f y exp log y!
1
(iii) –0.66498
(v) We can use the chi-squared distribution to compare Model A with Model B. We calculate the differ-
ence in the scaled deviances (which is just 2( log L A log L B )):
This should have a chi-squared distribution with 3 –1 = 2 degrees of freedom, which has a critical val-
ue at the upper 5% level of 5.991. Our value is significant here, since 10.10 > 5.991, so this suggests
that Model A is a significant improvement over Model B. We prefer Model A here.
ANSWER 48.
y
(i) f y exp log
(ii) 1 /
2
(iii) Variance function is
ANSWER 49.
i
y i In In 1 i
1 i n
(i) f y i exp In
1/n ny i
(ii) b" i 1 i
ANSWER 50.
2
(ii) E X ; var X
ANSWER 51.
ˆ yi
2 In i 1
yi
ˆ
i
ANSWER 52.
(i) Model 1: i j k
Model 2: ij k
Model 3: ijk
(ii) Model 1 does not allow for the possibility that there may be interactions (correlations) between some
of the factors. For example, it may be the case that young drivers tend to drive fast cars and to live in
towns.
With Model 3, which is a saturated model, it would be possible to fit the average values for each group
exactly ie there are no degrees of freedom left. This defeats the purpose of applying a statistical model,
as it would not 'smooth' out any anomalous results.
(iii) Normal error structure means that the randomness present in the observed values in each category
(eg young/fast/town) is assumed to follow a normal distribution.
The link function is the function applied to the linear estimator to obtain the predicted values. Asso-
ciated with each type of error structure is a 'canonical' or 'natural' link function. In the case of a nor-
mal error structure, the canonical link function is the identity function .
(iv) (a) The completed table, together with the differences in the scaled deviance and degrees of free-
dom, is shown below.
Constant: 1 50 7
Model 1: YO + FS + TC 10 4 40 3
Model 2: YO + FS + YO.FS + TC 5 3 5 1
Model 3: YO*FS*TC 0 0 5 3
ANSWER 53.
(i) In i is the natural link function.
17
x i
(ii) y ie 17
i 1
17 17
x i
x i y ie xi
i 1 i 1
(iii) (5.233,11.721)
ASSIGNMENT 5
SOLUTIONS
BAYESIAN STATISTICS
ANSWER 1. ANSWER 6.
ANSWER 2. ANSWER 7.
61 2/3
Gamma 55,
30
ANSWER 8.
ANSWER 3.
3
Gamma x 2, .
p must have a beta distribution. 2
ANSWER 4. ANSWER 9.
P 2| X 3 0.58806
P 10| X 7 0.32954
P 3| X 3 0.29205
P 12| X 7 0.15980
(i) 3
ANSWER 10.
(ii) 2.972
(i) Beta(x + 4, n– x + 4)
(iii) 2.917
(ii) 0.45161
(c) 0.513
(iii) (0.217, 1.00)
x + log2
ANSWER 12.
ANSWER 13. 5
(ii) Beta 6, n i 4
i 1
(i) Beta 2,n1
(iii) The two estimates are different. The Bayesian estimate of p under quadratic loss is the value of g that
minimises the expected posterior loss:
1
g p fpost p dp
2
The maximum likelihood estimate of p is the value of p that maximises the likelihood function.
ANSWER 14.
0.39722
CREDIBILITY THEORY
ANSWER 15.
£70
ANSWER 16.
(i) £3.50
ANSWER 17.
First, let's consider what happens when n increases. In A, Z increases. In B, Z is unaffected by n. In C, Z de-
creases. In practice, we want Z to increase as n increases, so B and C are inappropriate.
2
Now consider what happens when 0 (the variance of the prior distribution) increases. In all cases Z in-
creases. In practice, we want Z to increase, so all expressions are appropriate in this respect.
2
Finally, consider what happens when s (the sample variance) increases. In A, Z decreases. In B and C, Z is
2
unaffected by s . In practice, we want Z to decrease, so B and C are inappropriate.
ANSWER 18.
nx 600
2
2
1
(i) N 200 50 ,
n 1 n 1
2
2 2
2
200 50 200 50
n 1
2 2
(ii) 200 x 50 600
n 1 n 1
2
2 2
2
200 50 200 50
(iii) 0.669
ANSWER 19.
e
x / i
(i) n
x i /
e
(ii) (a) n 1
xi
(b)
n 1
n
(iii) Z
n 1
(iv) 0.9950
(v) The value of Z is very close to 1. So the credibility estimate is very close to the sample mean (98.26),
and takes little account of the prior mean (80). This is because n is much bigger than .
ANSWER 20.
1
2
(i) 2
1
2
1 1
n d
(b)
n n n
(iii) Increasing will reduce the denominator, and hence increase the value of Z.
(iv) Increasing the standard deviation of the prior distribution means that there is greater uncertainty as-
sociated with the prior distribution, and hence it is less reliable. So, when estimating , we should put
less weight on the prior mean and more weight on the maximum likelihood estimate of (which is
calculated from the sample data alone). To achieve this the credibility factor, Z, must increase. The
formula in (iii) illustrates this.
1 5
2
x 4j x4
4 j 1
132.7
(ii) Z =0.81695
Country 2: 67.37
Country 3: 71.94
Country 4: 59.03
ANSWER 22.
(i) £2,129,300
(ii) £2,397,400
ANSWER 23.
(a) This is false. is just a risk parameter that reflects the likelihood of claims. The true risk premium for
a given risk is E m | X .
(c) This is true. In fact, none of the quantities in the model are assumed to have any specific type of distri-
bution.
ANSWER 24.
Country Estimated credibility factor Risk premium per unit volume EBCT premium
ANSWER 25.
(i) 112.75
(ii) We could use EBCT Model 2, with risk volumes Pi,j for Quarter i of Year j of:
1 1 1
P1,4 1 P1,3 P1,2 2
P1,1 3
1.1 1.1 1.1
(iii) 102.417
(iv) In the EBCT approach of parts (i) and (ii), we are not assuming any particular distribution for the ran-
dom variables Yi,j . In the maximum likelihood approach in part (iii), we are explicitly assuming that
Also, the EBCT approach assumes that the data from all 4 quarters provide us with information about
Quarter 1, whereas the maximum likelihood approach only considers the data from Quarter 1.
ANSWER 26.
(i) £66.79m
(ii) £66.18m
(iii) The two models give fairly similar results. The estimate in Model 2 will depend on the prediction of
risk volume for the coming year.
In both cases we have used a very high value for the credibility factor. So we are effectively ignoring the
data from the other insurers, and are basing our estimate almost entirely on the data from Insurer B.
This seems sensible, given that both the volume figures and the average claim amounts appear to be
quite variable between the three different insurers. This suggests that we should not place too much
emphasis on the data from Insurers A and C, and focus on the information that we have for Insurer B.
ANSWER 27.
(b) 150.024
So we place more emphasis on the mean of the direct data for Insurer K, and this reduces the credibili-
ty estimate. As a result, the credibility premium per unit of volume will be lower for Insurer K than for
Insurer I.