Theoretical Problem Week6
Theoretical Problem Week6
Univer-
sity Lecturer Pekka Pere.
3 3
0, ≈ 0, .
n n+1
yields a valid one-sided 95% confidence interval. Both formulae are approxi-
mative. The latter approximation yields better coverage (closer to 95%). Both
formulae are applicable if n > 30.
Formula (1) fails also if π is close to 0 or 1 and n is small. E.g. if π = 0.05
and n = 25 then the 95% Wald confidence interval, defined by the bounds
in (1), covers π with probability about 0.75. Newcomb (1998, 868) argues
that confidence interval (1) should not be used in scientific research. Andersson
(2023), Fagerland et al. (2017, 65), Meeker et al. (2017, 105, 108), and Schilling
and Doi (2014) think likewise. The reason is the poor coverage of confidence
interval (1) if the sample is small and π is close to 0 or 1.
Plus four confidence interval (Agresti and Coull 1998) is a much better 95%
confidence interval if the sample is small. Four imaginary observations are added
to a sample (of n observations and y successes) as follows:
outcome
yes no Σ
y+2 n−y+2 n+4
Σ stands for the sum of the frequencies of outcomes yes and no. Plus four
confidence interval is calculated from this modified sample with formula (1).
Plus four confidence interval covers the true π with a probability which tends
to be much closer to 0.95 than the Wald confidence interval does. If π is very
close to 0 or 1 then the plus four confidence interval covers π with too large a
probability.
The Wald confidence interval for a difference of probabilites π1 −π2 is defined
by the bounds
s
π̂1 (1 − π̂1 ) π̂2 (1 − π̂2 )
π̂1 − π̂2 ± z1−α/2 + . (2)
n1 n2
Here it performs better than when estimating a single probability π but yet
tends to have a too small coverage probability.
Let us assume that the estimated probabilities π̂1 = n11 /n1 and π̂2 = n21 /n2
have been calculated from two independent samples with the following outcomes:
outcome
yes no Σ
sample 1 n11 n12 n1
sample 2 n21 n22 n2
outcome
yes no Σ
sample 1 n11 + 1 n12 + 1 n1 + 2
sample 2 n21 + 1 n22 + 1 n2 + 2
and to calculate a confidence interval by formula (2) from the modified samples.
Such a confidence interval is called an Agresti–Caffo confidence interval. Its
coverage probability is close to the intended even for fairly small sample sizes
(e.g. n1 = n2 = 20). If the sample sizes are very small (e.g. n1 = n2 = 10)
then the coverage probability of the Agresti–Caffo confidence interval is much
too large if πi s are close to 0 or 1 but otherwise can be satisfactory. The same
modification can be used for different confidence levels (not only 95%).
Derivations of the improved confidence intervals are skipped. The deriva-
tions employ exact small sample distribution of π̂ and Taylor approximation or
approximations of so called score confidence intervals.
Exercise
a) Intelligent extraterrestrial life has been searched for in many projects.
In the Phoenix project 1995–2004 radio wave frequencies from nearly 800 stars
were targeted and observed. No signs of anomalies or systemacy, i.e. signs of
intelligent extraterrestrial life, were detected from radio waves from any of the
stars.1
Calculate a 95% confidence interval for the proportion of stars of the kind
investigated in the Phoenix project with intelligent extraterrestrial life. In the
calculation assume that n = 800. What do you think?: Does the probability of
intelligent extraterrestrial life appear small or large? Would it be a good idea
to show the confidence interval to potential financiers of similar endeavours?
b) Morandi Bridge in Genova, Italy, collapsed 14.8.2018 killing 43 and in-
juring 16 people. The bridge had a concrete structure prone for corrosion and
damage. Need for maintenance had been reported.2
Finnish Transport Infrastructure Agency (Väylävirasto) announced 12.12.
2016 that it will investigate solidity of concrete in bridges built between 2011–
2016 and that the bridges to be investigated will be chosen by random sampling.
Yle News reported 2.1.2017 that the Agency had investigated 18 bridges and
had found deficiencies in 6 bridges.
Calculate a 95% confidence interval for the proportion of bridges built be-
tween 2011–2016 with deficiencies in solidity of concrete. Note: A confidence
1 https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence and
3 M. Broberg and M. Hakovirta (2009): Lapsistaan erillään asuvana isänä eron jälkeen.