0% found this document useful (0 votes)
36 views3 pages

Theoretical Problem Week6

This document discusses different methods for calculating confidence intervals for proportions and differences in proportions from small sample sizes. It provides formulas and examples comparing the Wald interval, plus four interval, and Agresti-Caffo interval, noting situations where each method performs better or worse in terms of achieving the intended coverage probability.

Uploaded by

thibebongtran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Theoretical Problem Week6

This document discusses different methods for calculating confidence intervals for proportions and differences in proportions from small sample sizes. It provides formulas and examples comparing the Wald interval, plus four interval, and Agresti-Caffo interval, noting situations where each method performs better or worse in terms of achieving the intended coverage probability.

Uploaded by

thibebongtran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

STATISTICAL INFERENCE (MS-C1620). 8.1.–17.4.2024. Aalto University.

Univer-
sity Lecturer Pekka Pere.

Theoretical exercise 5 (week 6)


Theory
The classical Wald confidence interval for a probability π is defined by the
bounds r
π̂(1 − π̂)
π̂ ± z1−α/2 . (1)
n
It is of width zero or is not meaningful if π̂ = 0. In such a case the rule of three

3 3
   
0, ≈ 0, .
n n+1

yields a valid one-sided 95% confidence interval. Both formulae are approxi-
mative. The latter approximation yields better coverage (closer to 95%). Both
formulae are applicable if n > 30.
Formula (1) fails also if π is close to 0 or 1 and n is small. E.g. if π = 0.05
and n = 25 then the 95% Wald confidence interval, defined by the bounds
in (1), covers π with probability about 0.75. Newcomb (1998, 868) argues
that confidence interval (1) should not be used in scientific research. Andersson
(2023), Fagerland et al. (2017, 65), Meeker et al. (2017, 105, 108), and Schilling
and Doi (2014) think likewise. The reason is the poor coverage of confidence
interval (1) if the sample is small and π is close to 0 or 1.
Plus four confidence interval (Agresti and Coull 1998) is a much better 95%
confidence interval if the sample is small. Four imaginary observations are added
to a sample (of n observations and y successes) as follows:

outcome
yes no Σ
y+2 n−y+2 n+4

Σ stands for the sum of the frequencies of outcomes yes and no. Plus four
confidence interval is calculated from this modified sample with formula (1).
Plus four confidence interval covers the true π with a probability which tends
to be much closer to 0.95 than the Wald confidence interval does. If π is very
close to 0 or 1 then the plus four confidence interval covers π with too large a
probability.
The Wald confidence interval for a difference of probabilites π1 −π2 is defined
by the bounds
s
π̂1 (1 − π̂1 ) π̂2 (1 − π̂2 )
π̂1 − π̂2 ± z1−α/2 + . (2)
n1 n2

Here it performs better than when estimating a single probability π but yet
tends to have a too small coverage probability.
Let us assume that the estimated probabilities π̂1 = n11 /n1 and π̂2 = n21 /n2
have been calculated from two independent samples with the following outcomes:
outcome
yes no Σ
sample 1 n11 n12 n1
sample 2 n21 n22 n2

An improvement is to add an imaginary observation to each outcome in the two


samples

outcome
yes no Σ
sample 1 n11 + 1 n12 + 1 n1 + 2
sample 2 n21 + 1 n22 + 1 n2 + 2

and to calculate a confidence interval by formula (2) from the modified samples.
Such a confidence interval is called an Agresti–Caffo confidence interval. Its
coverage probability is close to the intended even for fairly small sample sizes
(e.g. n1 = n2 = 20). If the sample sizes are very small (e.g. n1 = n2 = 10)
then the coverage probability of the Agresti–Caffo confidence interval is much
too large if πi s are close to 0 or 1 but otherwise can be satisfactory. The same
modification can be used for different confidence levels (not only 95%).
Derivations of the improved confidence intervals are skipped. The deriva-
tions employ exact small sample distribution of π̂ and Taylor approximation or
approximations of so called score confidence intervals.

Exercise
a) Intelligent extraterrestrial life has been searched for in many projects.
In the Phoenix project 1995–2004 radio wave frequencies from nearly 800 stars
were targeted and observed. No signs of anomalies or systemacy, i.e. signs of
intelligent extraterrestrial life, were detected from radio waves from any of the
stars.1
Calculate a 95% confidence interval for the proportion of stars of the kind
investigated in the Phoenix project with intelligent extraterrestrial life. In the
calculation assume that n = 800. What do you think?: Does the probability of
intelligent extraterrestrial life appear small or large? Would it be a good idea
to show the confidence interval to potential financiers of similar endeavours?
b) Morandi Bridge in Genova, Italy, collapsed 14.8.2018 killing 43 and in-
juring 16 people. The bridge had a concrete structure prone for corrosion and
damage. Need for maintenance had been reported.2
Finnish Transport Infrastructure Agency (Väylävirasto) announced 12.12.
2016 that it will investigate solidity of concrete in bridges built between 2011–
2016 and that the bridges to be investigated will be chosen by random sampling.
Yle News reported 2.1.2017 that the Agency had investigated 18 bridges and
had found deficiencies in 6 bridges.
Calculate a 95% confidence interval for the proportion of bridges built be-
tween 2011–2016 with deficiencies in solidity of concrete. Note: A confidence
1 https://en.wikipedia.org/wiki/Search_for_extraterrestrial_intelligence and

https://ui.adsabs.harvard.edu/abs/2004AAS...204.7504B (read 7.2.2024).


2 https://en.wikipedia.org/wiki/Ponte_Morandi (read 7.2.2024).
interval valid under large samples only is not asked for but a confidence interval
tailored for small samples.
c) Broberg and Hakovirta (2009) surveyd divorced Finnish fathers who do
not live with their children.3 The frequencies of meetings between fathers and
their children are tabulated below.

frequency of meetings between father


and children nij (%), i = 1, 2
age of the youngest child every week less often or not at all Σ
< 10 years 8 (42.1) 11 (57.9) 19 (100)
≥ 10 years 7 (33.3) 14 (66.7) 21 (100)
Fathers with children younger than 10 years of age appear to meet their children
more often than fathers with older children. Calculate a 95% Wald confidence
interval and a 95% Agresti–Caffo confidence interval for the difference of pro-
portions of fathers from the two groups who meet their children every week.
Compare the confidence intervals. Do the confidence intervals cover 0? How do
you interpret the results?

3 M. Broberg and M. Hakovirta (2009): Lapsistaan erillään asuvana isänä eron jälkeen.

In K. Forssén, A. Haataja and M. Hakovirta (eds.): Yksinhuoltajuus Suomessa. Väestön-


tutkimuslaitos. Tutkimuksia D 50/2009. The information in the tables has been provided by
Mia Hakovirta (personal communication).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy