Astronomical Statistics: Tutorial Questions 1: John Peacock
Astronomical Statistics: Tutorial Questions 1: John Peacock
UN S
IT
TH
Y
H
O F
G
E
R
DI
N B U
John Peacock
C20, Royal Observatory; jap@roe.ac.uk
October 2012
1. A test for cancer is known to be 90% accurate either in detecting cancer if present or
in giving an all-clear if cancer is absent. The prevalence of cancer in the population is
1% – how worried should you be if you test positive? (hint: apply Bayes’ Theorem).
2. A game-show host invites you to choose one of three doors for a chance to win a car
(behind one door) or a cuddly toy (behind the other two). You choose a door, but
the host opens a different door to reveal a toy – and asks you if you want to switch
your choice to the other unopened door. Does switching improve your chance of
winning the car? Try to justify your answer using both frequentist and Bayesian
reasoning. (Hint: from the frequentist point of view, the three permutations CTT,
TCT, TTC are equally likely).
3. A detector receives 1400 photons from region of sky containing a possible source.
1100 photons are received from a blank piece of sky of the same area, in the same
exposure time. What is the significance of the detection of the source? (state your
assumptions, and list possible complicating effects that cannot be addressed with
this information). Another observer then detects 1150 photons from another blank
piece of sky of the same area in the same time. Does the significance of the detection
go up or down?
4. For a binomial distribution, the probability of n successes out of N trials with
probability of success p is
Pn = CnN pn (1 − p)N −n , (0 ≤ n ≤ N ), (1)
where
N!
CnN = . (2)
n!(N − n)!
This can be generalized to the multinomial distribution, in which there are possible
outcomes 1 to m, with probabilities p1 to pm . The probability of obtaining n1 cases
of outcome 1 through to nm cases of outcome m is
N!
p(n1 . . . nm ) = pn1 pn2 . . . pnmm . (3)
n1 ! n2 ! . . . nm ! 1 2
Give two separate arguments to justify the form of this result.
[PTO]
1
5. For a Gaussian-distributed variable, x, with mean µ and rms σ, evaluate (i) hxi; (ii)
hx2 i; (iii) hx3 i; (iv) hx4 i. (hint: use the properties of a Gaussian centred at zero).
Taking N independent samples, xi , verify that the sample variance of the data has
P
an expectation value of σ 2 (1 − 1/N ). What is the expectation of i (xi − m)3 /N ,
where m is the sample mean?
(i) What conditions must the constants α and β satisfy in order for this to be a
valid pdf?
(ii) Calculate the mean, m, and variance, s2 , of this distribution.
(iii) Calculate the skewness, h(x−m)3 i, and kurtosis, h(x−m)4 i, of this distribution.
Hints:
a. You may use the result that for a Gaussian variable:
hx2n i = (2n − 1)!!hx2 in
where
(2n − 1)!! = (2n − 1).(2n − 3).(2n − 5) · · · 3.1.
b. If hint (a) is not a known fact, it is worth showing it by recurrence.
c. Exploit symmetries in the integrals.
Evaluate the variances in x and y, together with the correlation coefficient between
x and y.
New variables are defined as linear combinations corresponding to a rotation in the
x − y plane: x = (cos θ)X − (sin θ)Y ; y = (cos θ)Y + (sin θ)X. Show that, for
a suitable choice of angle, the variables X and Y are independent. What are the
variances in X and Y ?