0% found this document useful (0 votes)
53 views14 pages

Sheet8 Sol

This document contains two statistical inference problems with solutions. 1) The first problem estimates an unknown number of balls (N) in an urn based on drawing a ball and observing its number. It finds the distribution, expected value, and variance of the random variable X (the number on the drawn ball), and shows an estimator T(X) is unbiased for N. 2) The second problem estimates the probability (p) of a fish species based on catching fish until n of that species are caught. It derives the probability distribution of the random variable X (total fish caught) and shows an estimator is unbiased for p.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views14 pages

Sheet8 Sol

This document contains two statistical inference problems with solutions. 1) The first problem estimates an unknown number of balls (N) in an urn based on drawing a ball and observing its number. It finds the distribution, expected value, and variance of the random variable X (the number on the drawn ball), and shows an estimator T(X) is unbiased for N. 2) The second problem estimates the probability (p) of a fish species based on catching fish until n of that species are caught. It derives the probability distribution of the random variable X (total fish caught) and shows an estimator is unbiased for p.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Course of Study Exercises Statistics


Bachelor Computer Science WS 2020/21

Sheet VIII - Solutions


Statistical Inference
1. In an urn there is an unknown number N of balls numbered from 1
to N . The number of N should be estimated. A ball from the urn is
used for this purpose and his number is noted. Describe the random
variable X= the number of the drawn ball.

(a) Determine the distribution of X depending on N . Calculate the


expected value and variance of X.
(b) Show that T (X) = 2X − 1 is an unbiased estimator for N is.
(c) Calculate for N = 4 and N = 5 the probability for N to be exactly
estimated at T .
(d) Calculate the variance of T .

Answer:
1
(a) uniform distribution: P (X = k) = ϑ
for k = 1, ..., θ, i.e. E(X) =
2
N +1
2
, Var(X) = N12−1
(b) E(T (X)) = E(2X − 1) = 2E(X) − 1 = N
1 N +1

N +1 ∈N
(c) P (T (X) = N ) = P (2X−1 = N ) = P (X = 2
) = N 2 ⇒
0 else
N=4: P (T (X) = N ) = 0 and N=5: P (T (X) = N ) = 1/5
N 2 −1
(d) V ar(T ) = V ar(2X − 1) = 4V ar(X) = 3

2. Fish are caught from a lake, until you get n (n ≥ 3) fishes of a certain
species A. The random variable X describe the number of all caught
fishes to this time. The lake contained a great number of fishes, so
that it can be assumed that the ratio p of the number of fishes of the
species A to the total number of all fish of the lake does not change,
when some fish are caught out of the lake.
k−1 n

(a) Show that Pp (X = k) = n−1 p (1 − p)k−n , k = n, n + 1, . . .

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


2
n−1
(b) Show that T (X) = is an unbiased estimator for p.
X −1
Answer:

(a) X ∈ {n, n + 1, n + 2, ....}

P (X = k) = P ({n-1 fishes from species A are among the first k-1 caught fishes} ∪
{k th caught fish is a fish from species A})
 
k − 1 n−1
= p (1 − p)k−1−n+1 · p
n−1
 
k−1 n
= p (1 − p)k−n
n−1
(b)
n−1
E(T (X)) = E( )
X −1
∞  
X n−1 k−1 n
= · p (1 − p)k−n
k=n
k−1 n−1
∞  
X k−2 n
= p (1 − p)k−n
k=n
n − 2
∞  
X k − 2 n−1
= p· p (1 − p)k−n
k=n
n − 2
∞  
X k−1
= p· pn−1 (1 − p)k+1−n
k=n−1
(n − 1) − 1

X
= p· Pp (X̃ = k) = p
k=n−1

with X̃ number of all caught fishes until n-1 fishes of a certain are
get.

Maximum Likelihood Estimation


1. A ticket inspector checks for Frankfurt S-Bahn lines the tickets from
the passengers. He keeps checking until he sees a passenger without
valid ticket. He then collects the increased fare and starts after a break
with a new check of the tickets.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


3
For 10 such check runs, he shall have

42 50 40 64 30 36 68 42 46 48

until he have found a non valid ticket.


Determine a maximum likelihood estimator based on the given numbers
for p share of nonvalid tickets among all checked ticktes.
Answer: ϑ ∈ (0, 1) = ratio non valid tickets
The random variable X = “number of tickets until the first non valid
ticket” is geometricaly distributed with parameter ϑ, i.e. P (X = k) =
(1 − ϑ)k−1 ϑ, k = 1, 2, ...
Likelihoodfunction
n
Y Pn
L(x1 , ..., xn ; ϑ) = (1 − ϑ)xi −1 ϑ = ϑn (1 − ϑ)( i=1 xi )−n
i=1

Easier to consider is P
f (ϑ) = ln L(x1 , ..., xnP; ϑ) = n ln ϑ + ( i=1 nxi − n) ln(1 − ϑ)
n
xi −n
From f 0 (ϑ) = nϑ − i=1 1−ϑ
= 0 we get, ϑ̂ = Pnn xi . f 0 has a sign
i=1
change from + to -. Thus there is local maximum.
Here: ϑ̂ = 0.0215

2. A device consists of the components K1 , K2 and K3 . The device be-


comes defective as soon as one or more of the components is defective.
The lifetimes L1 , L2 and L3 (in h) of the three components are inde-
pendent random variables.
1 − e−λx für x ≥ 0

The distribution function of L1 is F1 (x) =
0 sonst
 √
3
1 − e−λ x für x > 0
The distribution functions of L2 and L3 are F2 (x) = .
0 sonst
λ is an unknown parameter > 0.

(a) Calculate the distribution function and density for the lifetime S
of the device.
(b) When measuring the lifetime of randomly from production of the
devices removed resulted in following values in hours:

82.2 94.0 122.5 95.8 106.4

Use a maximum likelihood estimator to determine the an estimate


for λ.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


4
Answer:

(a)

P (S ≤ s) = 1 − P (S > s) = 1 − P (S1 > s) · P (S2 > s) · P (S3 > s)


 √
3
1 − e−λ(s+2 s) für s > 0
=
0 sonst

3
density function: f (λ, s) = λ(1 + 3
2

3
s2
)e−λ(s+2 s)

(b) Likelihoodfunktion
5
5
Y 2 √
−λ(si +2 3 si )
L(s1 , ..., s5 ; λ) = λ (1 + p )e
i=5 3 3 s2i

Taking the logarithm of the likelihood we get


4
!
X 2 √
f (λ) = ln(L(s1 , ..., s5 ; λ) = 5 ln λ+ ln(1 + p
3 2
) − λ(si + 2 3 si )
i=1 3 si

Taking the first derivative of f (λ) and set it zero


5
0 5 X √
f (λ) = − (si + 3 si ) = 0
λ i=1

we get that we have a local maximum at


5
λ̂ = P5 √ = 0.00914
i=1 (si + 2 si )
3

3. To determine the number of N of red deers living in a precinct region


7 red deer were caught and marked in a trapping action. Afterwards
the animals were again released. After a certain time, another trapping
action was started. Thereby 3 red deer were caught, whereby 2 already
were marked. It is assumed that between is no influx or outflow of red
deer in the region and that the animals were able to pass the region
within a short period of time.

(a) Determine a maximum likelihood estimator for the total number


N of the red deer living in the region.
(b) A third trapping action started, where 8 red deers were caught.
4 of them were marked. What is no the maximum likelihood
estimation of N ?

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


5
Answer:

(a) If N denotes the unknown number of red deers and X denotes the
random variables which counts the number of caught marked red
deers in the second trapping action we have
7 N −7
 
2
PN (X = 2) = N
1
3

The Likelihoodfunktion L(2; N ) is nothing else then this proba-


bility.

Likelihod function

N p
0.5

7 0.00
prob. of the observation

0.4

8 0.38
9 0.50
0.3

10 0.52
11 0.51
0.2

12 0.48
0.1

13 0.44
14 0.40
0.0

15 0.37
16 0.34 10 20 30 40 50

N
blue = first caught, red: first and second caught

⇒ maximum likelihood estimation of N is 10.


(b) Let Y denotes the number of caught marked red deers in the third
trapping action
7+1 N −7−1
 
4
PN (Y = 4) = N
4
8

The probability of both observation is PN (X = 2) · PN (Y = 4),


which is the likelihood function L(2, 4; N )

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


6
N p
9 0
10 0
11 0
12 0.0675
13 0.120
14 0.141
15 0.141
16 0.128
17 0.112
18 0.0951
19 0.0795
20 0.0659
⇒ maximum likelihood estimation of N is 14.

Confidence Intervals
1. A population is known to be normally distributed with a standard
deviation of 2.8.

(a) Compute the 95% confidence interval on the mean based on the
following sample of nine: 8, 9, 10, 13, 14, 16, 17, 20, 21.
(b) Now compute the 99% confidence interval using the same data.

Answer: Assumption: Normal distribution with known standard de-


viation σ = 2.8

(a) Wanted: 95% confidence interval for µ


n = 9
Data: P9 xi 128
x̄ = i=1 9 = 9 = 14.22
Confidence interval:
2.8
x̄ ± √σn · u0.975 = 14.22 ± √ 9
· 1.96 = 14.22 ± 1.829 ⇒ [12.39, 16.05]
(b) Wanted: 99% confidence interval for µ
n = 9
Data: P9 xi 128
x̄ = i=1 9 = 9 = 14.22
Confidence interval: √
x̄ ± √σn · u0.995 = 14.22 ± √2.8
9
· 2.5758 = 14.22 ± 7.2122/ 9 ⇒
[11.82, 16.62]

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


7
##################################################
# A p o p u l a t i o n i s known t o be n o r m a l l y d i s t r i b u t e d
# with a standard d e v i a t i o n o f 2 . 8 .
#
# f i l e : i n f s t a t c o n f i n t e r v a l n o r m a l m e a n .R
##################################################

# a ) Compute t h e 95% c o n f i d e n c e i n t e r v a l on t h e mean


s a m p l e <= c ( 8 , 9 , 1 0 , 1 3 , 1 4 , 1 6 , 1 7 , 2 0 , 2 1 )
a l p h a <= 0 . 0 5
m <= mean ( s a m p l e )
m
s <= 2 . 8
q a <= qnorm(1 = a l p h a / 2 , 0 , 1 )
q a
u <= m=q a * s / s q r t ( l e n g t h ( s a m p l e ) )
o <= m+q a * s / s q r t ( l e n g t h ( s a m p l e ) )
u;o

# b ) Now compute t h e 99% c o n f i d e n c e interval using t h e same d a t a .


a l p h a <= 0 . 0 1
q a <= qnorm(1 = a l p h a / 2 , 0 , 1 )
q a
u <= m=q a * s / s q r t ( l e n g t h ( s a m p l e ) )
o <= m+q a * s / s q r t ( l e n g t h ( s a m p l e ) )
u;o

# S o l u t i o n a p p l y i n g z . t e s t ( ) from t h e TeachingDemos p a c k a g e
l i b r a r y ( TeachingDemos )
z . t e s t ( x= sample , s d = 2 . 8 , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 5 ) $ c o n f . i n t
# a)
z . t e s t ( x = sample , s d = 2 . 8 , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 9 ) $ c o n f . i n t # b )

2. You take a sample of 22 from a population of test scores, and the mean
of your sample is 60.

(a) You know the standard deviation of the population is 10. What
is the 99% confidence interval on the population mean?
(b) Now assume that you do not know the population standard devi-
ation, but the standard deviation in your sample is 10. What is
the 99% confidence interval on the mean now?

Hint: Assume that the test scores follow a normal distribu-


tion.
n = 22
Answer: Assumption: Normal distribution, Data:
x̄ = 60

(a) Wanted: 99% confidence interval for µ


Assumption: Known standard deviation σ = 10
Confidence interval:
x̄ ± √σn · u0.995 = 60 ± √1022 · 2.5758 = 60 ± 5.492 ⇒ [54.508, 65.492]
(b) Wanted: 99% confidence interval for µ
Assumption: Unknown standard deviation, but already estimated
s = 10 (i.e. tn−1 −distribution is used)
Confidence interval:
x̄ ± √σn · t21,0.995 = 60 ± √1022 · 2.8314 = 60 ± 6.036 ⇒ [53.963, 66.036]

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


8
##################################################
# You t a k e a s a m p l e o f 22 from a p o p u l a t i o n o f t e s t
# s c o r e s , and t h e mean o f y o u r s a m p l e i s 6 0 .
#
# f i l e : i n f s t a t c o n f i n t e r v a l l n o r m a l m e a n s d u n k n o w n .R
###################################################
n <= 22
m <= 60

# a ) You know t h e s t a n d a r d d e v i a t i o n o f t h e p o p u l a t i o n i s 1 0 . What


# i s t h e 99\% c o n f i d e n c e i n t e r v a l on t h e p o p u l a t i o n mean .
a l p h a <= 0 . 0 1
s <= 10
q a <= qnorm(1 = a l p h a / 2 , 0 , 1 )
q a
u <= m=q a * s / s q r t ( n )
o <= m+q a * s / s q r t ( n )
u;o

# S o l u t i o n a p p l y i n g z . t e s t ( ) from t h e TeachingDemos p a c k a g e
l i b r a r y ( TeachingDemos )
z . t e s t ( x = m, s d = 1 0 , a l t e r n a t i v e = ” two . s i d e d ” , n = 2 2 , c o n f . l e v e l = 0 . 9 9 ) $conf . int
# b ) Now assume t h a t you do n o t know t h e p o p u l a t i o n s t a n d a r d
# d e v i a t i o n , but t h e s t a n d a r d d e v i a t i o n i n y o u r s a m p l e i s 1 0 . What
# i s t h e 99\% c o n f i d e n c e i n t e r v a l on t h e mean now?
s s a m p l e <= 10
t a <= q t (1 = a l p h a / 2 , n = 1)
t a
u <= m= t a * s / s q r t ( n )
o <= m+t a * s / s q r t ( n )
u;o

3. Calculate for the below given sample from a normally distributed pop-
ulation the 95% confidence intervals

(a) for the mean, if the standard deviation is 2


(b) for the mean, if the standard deviation is unknown
(c) for the variance, if the mean is 250
(d) for the variance, if the mean is unknown

xi : 247.4, 249.0, 248.5, 247.5, 250.6, 252.2, 253.4, 248.3, 251.4, 246.9,
249.8, 250.6, 252.7, 250.6, 250.6, 252.5, 249.4, 250.6, 247.0, 249.4

Answer: sample size n=20, x̄ = 249.92, s = 1.9479, α = 0.05


h i
(a) x̄ − u1−α/2 √σn , x̄ + u1−α/2 √σn = [249.04, 250.80]
h i
s s
(b) x̄ − tn−1,1−α/2 n , x̄ + tn−1,1−α/2 n = [229.01, 250.83
√ √
 
Qn Qn
= [2.11, 7.53] with Qn = ni=1 (xi − µ)2 = 72.22
P
(c) χ2 , χ2
n,1−α/2 n,α/2
 
(n−1)s2 (n−1) s2
(d) χ2 , χ2 = [2.19, 8.09]
n−1,1−α/2 n−1,α/2

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


9
####################################################
# C a l c u l a t e f o r t h e g i v e n s a m p l e from n o r m a l l y
# d i s t r i b u t e d p o p u l a t i o n t h e 95% c o n f i d e n c e i n t e r v a l s
# a ) f o r t h e mean , i f t h e s t a n d a r d d e v i a t i o n i s 2
# b ) f o r t h e mean , i f t h e s t a n d a r d d e v i a t i o n i s unknown
# a ) f o r t h e v a r i a n c e , i f t h e mean i s 250
# a ) f o r t h e v a r i a n c e , i f t h e mean i s unknown
#
# f i l e : i n f s t a t c o n f i n t e r v a l l n o r m a l m u s i g m a .R
#####################################################

# c r e a t e sample v a l u e s
# s . v a l u e s <= round ( rnorm ( n=20 , mean = 2 5 1 , s d = 2 ) , 1 )
s . v a l u e s <= c ( 2 4 7 . 4 , 2 4 9 . 0 , 2 4 8 . 5 , 2 4 7 . 5 , 2 5 0 . 6 , 2 5 2 . 2 , 2 5 3 . 4 , 2 4 8 . 3 , 2 5 1 . 4 , 2 4 6 . 9 ,
249.8 ,250.6 ,252.7 ,250.6 ,250.6 ,252.5 ,249.4 ,250.6 ,247.0 ,249.4)
# c h a r a c t e r i s t i c s o f the sample
n <= l e n g t h ( s . v a l u e s )
x b a r <= mean ( s . v a l u e s )
s <= s d ( s . v a l u e s )
# l e v e l 1= a l p h a
a l p h a <= 0 . 0 5

# c o n f i d e n c e i n t e r v a l l s f o r mu
# a ) a s s u m p t i o n : s i gm a = 2
s i gm a <= 2
l . a <= x b a r = qnorm(1 = a l p h a / 2 ) * s ig m a / s q r t ( n )
u . a <= x b a r + qnorm(1 = a l p h a / 2 ) * s ig m a / s q r t ( n )
l .a; u.a
# b ) a s s u m p t i o n : s i g ma = unknown
l . b <= x b a r = q t (1 = a l p h a / 2 , d f = n = 1) * s / s q r t ( n )
u . b <= x b a r + q t (1 = a l p h a / 2 , d f = n = 1) * s / s q r t ( n )
l .b; u.b

# c o n f i d e n c e i n t e r v a l l s f o r s i gm a ˆ2
# c ) a s s u m p t i o n : mu = 250
mu <= 250
Qn <= sum ( ( s . v a l u e s = mu) ˆ 2 )
l . c <= Qn/ q c h i s q (1 = a l p h a / 2 , d f = n )
u . c <= Qn/ q c h i s q ( a l p h a / 2 , d f = n )
l .c; u. c
# d ) a s s u m p t i o n : mu unknown
l . d <= ( n = 1) * s ˆ2/ q c h i s q (1 = a l p h a / 2 , d f = n = 1)
u . d <= ( n = 1) * s ˆ2/ q c h i s q ( a l p h a / 2 , d f = n = 1)
l .d; u.d

# s o l u t i o n s a p p l y i n g z . t e s t ( ) , s i gm a . t e s t ( ) from TeachingDemos and t . t e s t ( )


l i b r a r y ( TeachingDemos )
z . t e s t ( x = s . v a l u e s , sd = 2 , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 5 ) $ c o n f . i n t # a )
t . t e s t ( x = s . v a l u e s , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 5 ) $ c o n f . i n t
# b)
s i gm a . t e s t ( x = s . v a l u e s , a l t e r n a t i v e = ” two . s i d e d ” , c o n f . l e v e l = 0 . 9 5 )
# d)

4. At a telemarketing firm, the length of a telephone solicitation (in sec-


onds) is a normally distributed random variable with mean µ and stan-
dard deviation σ, both unknown. A sample of 51 calls has mean length
300 and standard deviation 60.

(a) Construct the 95% confidence upper bound for µ.


(b) Construct the 95% confidence lower bound for σ.

Answer: Sample size n = 51 and sample mean x̄ = 300 and sample


standard deviation s = 60

(a) Wanted: Confidence interval for µ at level 1 − α = 95%


In
h general we have the two-sided confidence i interval.
x̄ − tn−1, 1− α2 · √sn , x̄ + tn−1, 1− α2 · √sn

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


10
A
 one-sided confidence interval
i h (upper boundary): i
−∞, x̄ + tn−1, 1−α · √n = −∞, 300 + 1.6759 · √6051 = (−∞, 314, 23]
s

with t50,0.95 = 1.6759


(b) Wanted: Confidence interval for σ at level 1 − α = 95%  
2 (n−1)s2
In general we have the two sided confidence interval for σ 2 : χ2(n−1)s , χ2n−1,α/2
n−1,1−α/2

A
hqone-sided confidence
 interval (lower boundary) for σ:
(n−1)s2
χ2
, ∞ = [51, 57 ; ∞) with χ2n−1,1−α = χ250,0.95 = 67.505
n−1,1−α

####################################################
# At a t e l e m a r k e t i n g f i r m , t h e l e n g t h o f a t e l e p h o n e
# s o l i c i t a t i o n ( in seconds ) i s a normally d i s t r i b u t e d
# random v a r i a b l e w i t h mean mu and s t a n d a r d d e v i a t i o n
# sigma , both unknown . A s a m p l e o f 50 c a l l s h a s mean
# l e n g t h 300 and s t a n d a r d d e v i a t i o n 6 0 .
#
# f i l e : i n f s t a t c o n f i n t e r v a l t e l e f i r m .R
#####################################################
n <= 5 0 ; m <= 3 0 0 ; s s a m p l e <= 6 0 ; a l p h a <= 0 . 0 5

# a ) C o n s t r u c t t h e 95% c o n f i d e n c e u p p e r bound f o r mu .
t a <= q t (1 = a l p h a , n = 1)
t a
o <= m+t a * s s a m p l e / s q r t ( n )
o

# b ) C o n s t r u c t t h e 95% c o n f i d e n c e l o w e r bound f o r s ig m a .
c h i <= q c h i s q (1 = a l p h a , n = 1)
chi
u <= ( n = 1) * s s a m p l e ˆ2/ c h i
sqrt (u)

5. At a certain farm the weight of a peach (in ounces) at harvest time


is a normally distributed random variable with standard deviation 0.5.
How many peaches must be sampled to estimate the mean weight with
a margin of error ±0.2 and with 95% confidence.
Answer: Standard deviation σ = 0.5 known.
Confidence interval: x̄ ± √σn · u0.975

Wanted: n with √σn ·u0.975 = 0.2 i.e. √0.5
n
·1.96 = 0.2 i.e. n = 0.5·1.96
0.2
i.e.
0.5·1.96 2

n= 0.2
= 4.92 = 24.01 ⇒ n = 25 since we must round upwards.
########################################################
# At a c e r t a i n farm t h e w e i g h t o f a p e a c h ( i n o u n c e s )
# a t h a r v e s t t i m e i s a n o r m a l l y d i s t r i b u t e d random
# v a r i a b l e w i t h s t a n d a r d d e v i a t i o n 0 . 5 . How many p e a c h e s
# must be sampled t o e s t i m a t e t h e mean w e i g h t w i t h a
# margin o f e r r o r pm 0 . 2 and w i t h 95% c o n f i d e n c e .
#
# f i l e : i n f s t a t c o n f i n t e r v a l p e a c h .R
########################################################

a l p h a <= 0 . 0 5 ; s <= 0 . 5 ; margin <= 0 . 2


q a <= qnorm(1 = a l p h a / 2 , 0 , 1 ) ; q a
n <= c e i l i n g ( ( q a * s / margin ) ˆ 2 )
n

6. You read about a survey in a newspaper and find that 70% of the 250
people sampled prefer candidate A.

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


11
(a) Compute the 95% confidence interval.
(b) You are surprised by this survey because you thought that more
like 50% of the population preferred this candidate. Based on this
sample, is 50% a possible population proportion?
n = 250
Answer: Data:
x̂ = 0.70

(a) Wanted: Confidence


q interval for p at level 1 − α = 0.95
p̂ ± u1− α2 · p̂(1−p̂)
n
q
We have: 0.70 ± 1.96 · 0.70(1−0.70)
250
= 0.70 ± 0.057
i.e. [0.6432, 0.7568]
(b) Possible, but with a very low probability since 50% is not in the
confidence interval.
########################################################
# You r e a d a b o u t a s u r v e y i n a n e w s p a p e r and f i n d
# t h a t 70% o f t h e 250 p e o p l e sampled p r e f e r C a n d i d a t e A .
# a ) Compute t h e 95% c o n f i d e n c e i n t e r v a l .
# b ) You a r e s u r p r i s e d by t h i s s u r v e y b e c a u s e you t h o u g h t
# t h a t more l i k e 50% o f t h e p o p u l a t i o n p r e f e r r e d t h i s
# c a n d i d a t e . Based on t h i s sample , i s 50% a p o s s i b l e
# population proportion ?
#
# f i l e : i n f s t a t c o n f i n t e r v a l p r o p s u r v e y .R
########################################################

n <= 2 5 0 ; p <= 0 . 7 ; a l p h a <= 0 . 0 5

# normal a p p r o x i m a t i o n
l . appr <= p = qnorm(1 = a l p h a / 2 ) * s q r t ( p * (1 = p ) / n )
u . appr <= p + qnorm(1 = a l p h a / 2 ) * s q r t ( p * (1 = p ) / n )
l . appr ; u . appr

# exact
xp <= s e q ( 0 , 1 , l e n g t h =1+10ˆ4)
l . ex <= xp [ min ( which ( qbinom(1 = a l p h a / 2 , n , xp ) == p * n ) ) ]
u . ex <= xp [ max ( which ( qbinom ( a l p h a / 2 , n , xp ) == p * n ) ) ]
l . ex ; u . ex

# e x a c t c o n f i d e n c e i n t e r v a l w i t h R= f u n c t i o n
binom . t e s t ( x = 0 . 7 * 2 5 0 , n =250 , c o n f . l e v e l =1= a l p h a ) $ c o n f . i n t

7. A researcher was interested in knowing how many people in the city


supported a new tax. He sampled 100 people from the city and found
that 40% of these people supported the tax. What is the upper limit of
the 95% (one-side) confidence interval on the population proportion?
Answer: Survey with n = 100 and 40% approve the taxes
Wanted: Upper-boundary confidence
q interval for a proportion p = at
level 1 − α = 0.95: p̂ + u1−α · p̂(1−p̂)
n
We have n = 100, p̂ q = 0.4 and 1 − α = 0.95 ⇒ u1−α = 1.645, i.e. we
0.4·0.6
become 0.4 + 1.645 · 100
= 0.48

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


12
##################################################
# A r e s e a r c h e r was i n t e r e s t e d i n knowing how many
# p e o p l e i n t h e c i t y s u p p o r t e d a new t a x . She sampled
# 100 p e o p l e from t h e c i t y and f o u n d t h a t 40% o f
# t h e s e p e o p l e s u p p o r t e d t h e t a x . What i s t h e u p p e r
# l i m i t o f t h e 95% ( one = s i d e ) c o n f i d e n c e i n t e r v a l
# on t h e p o p u l a t i o n p r o p o r t i o n ?
#
# f i l e : i n f s t a t c o n f i n t e r v a l l p r o p o n e s i d e d .R
##################################################

n <= 1 0 0 ; p <= 0 . 4 ; a l p h a <= 0 . 0 5

# normal a p p r o x i m a t i o n
u . appr <= p + qnorm(1 = a l p h a ) * s q r t ( p * (1 = p ) / n )
u . appr

# exact
xp <= s e q ( 0 , 1 , l e n g t h =1+10ˆ4)
u . ex <= xp [ max ( which ( qbinom ( a l p h a , n , xp ) == p * n ) ) ]
u . ex

# e x a c t c o n f i d e n c e i n t e r v a l w i t h R= f u n c t i o n
binom . t e s t ( x =40 , n =100 , a l t e r n a t i v e = ” l e s s ” ,
c o n f . l e v e l =1= a l p h a ) $ c o n f . i n t

8. An advertising agency wants to construct a 99% confidence lower bound


for the proportion of dentists who recommend a certain brand of tooth-
paste. The margin of error is to be 0.02. How large should the sample
be?
Answer: The lower boundary at level 1 − α = 0.99 for the proportion
p is denoted z.qThus, we have q
z = p̂ − u1−α · p̂(1−p̂)
n
with u1−α · p̂(1−p̂)
n
≤ 0.02
α = 0.01 ⇒ u0.99 = 2.326 q
p̂(1−p̂)
n is unknown, thus 2.326 · n
≤ 0.02 i.e.
2
2.326 · p̂(1 − p̂) ≤ 0.02 · n i.e. n ≥ 2.326
2 2
0.022
p̂(1 − p̂)
For which p̂ has the function y = p̂(1 − p̂) a maximum? We take the
derivative: y = p̂ − p̂2 ⇒ y 0 = 1 − 2p̂ and then y 0 = 1 − 2p̂ = 0 ⇒ p̂ = 12 .
Thus, we have y = p̂(1 − p̂) ≤ 12 · 21 = 14 .
2
This gives n ≥ 2.326
0.022
· 41 = 3381
2.3262
If we suppose that p ≤ 0.25, i.e. n ≥ 0.022
· 14 · 3
4
= 2537
##################################################
# An a d v e r t i s i n g a g e n c y wants t o c o n s t r u c t a 99%
# c o n f i d e n c e l o w e r bound f o r t h e p r o p o r t i o n o f
# d e n t i s t s who recommend a c e r t a i n brand o f t o o t h p a s t e .
# The margin o f e r r o r i s t o be 0 . 0 2 . How l a r g e s h o u l d
# t h e s a m p l e be ?
#
# f i l e : i n f s t a t c o n f i n t e r v a l p r o p s a m p l e s i z e .R
##################################################

a l p h a <= 0 . 0 1 ; margin <= 0 . 0 2


c <= qnorm(1 = a l p h a , 0 , 1 )
f <= s e q ( 0 , 1 , l e n g t h =101)
n <= max ( c e i l i n g ( c ˆ2 * f * (1 = f ) / ( margin ˆ 2 ) ) )
n

9. The interval [45.6, 47.8] is a symmetric 99% confidence interval for the
unknown parameter µ based on a sample x1 , . . . , x10 from a normal

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


13
2
distribution N (µ, σ ) with unknown σ. Calculate the sample mean x̄
and the sample standard deviation s.
Answer: Mean: x̄ = 45.6+47.8 = 46.7 and using the lower limit 45.6 we
2 √ √
get 45.6 = x̄ − t9, 0.995 · √n i.e. s = tx̄−45.6
s
9, 0.995
· n = 46.7−45.6
3.25
· 10 = 1.07

10. The waiting time at the pay desk of a certain supermarket is normally
distributed with mean waiting time µ and known standard deviation
σ = 1, 8 minutes. A confidence interval for the mean waiting time
(in minutes) for this supermarket is [5.12; 8.32]. If the sample size is
n = 10, what is then the confidence level?
Answer: The length of the interval is 8.32 − 5.12 and
8.32 − 5.12 = 2 · u1− α2 · √σn = 2 · u1− α2 · √1.8
10
i.e. u1− α2 = 2.81 and the
α
normal distribution table gives 1 − 2 = 0.9975 i.e. α ≈ 0.005. So the
confidence level is 1 − α = 99.5%.

11. R programming task: Consider an urn with M white balls and N-


M black. n balls are drawn without replacement and X denotes the
number of white balls in the sample. N=500 and n=50 are known but M
the number of white balls is unknown. Construct an two sided 1 − α =
0.95 confidence intervall for M based on the H(N,M,n)-distribution of
X. Compare it with a binomial and a normal approximation.
Answer:
########################################################
# C o n s i d e r an urn w i t h M w h i t e b a l l s and N=M b l a c k . n
# b a l l s a r e drawn w i t h o u t r e p l a c e m e n t and X d e n o t e s t h e
# number o f w h i t e b a l l s i n t h e s a m p l e . N=500 and n=50
# a r e known but M t h e number o f w h i t e b a l l s i s unknown .
# C o n s t r u c t an two s i d e d 1= a l p h a =0.95 c o n f i d e n c e i n t e r v a l l
# f o r M b a s e d on t h e H(N,M, n)= d i s t r i b u t i o n o f X . Compare
# i t w i t h a b i n o m i a l and a normal a p p r o x i m a t i o n .
#
# f i l e : i n f s t a t c o n f i n t e r v a l h y p e r g e o M .R
########################################################

library ( tidyverse )

# urn m o d e l l : N t o t a l number o f b a l l s , M = number o f white


# b a l l s , n = number o f drawn b a l l s
# X = number o f w h i t e b a l l s ˜ H(N,M, n )
N <= 500
n <= 50
a l p h a <= 0 . 0 5

# s y m m e t r i c i n t e r v a l s [ l b , ub ] f o r X w i t h p r o b a b i l i t y 1= a l p h a
# for d i f f e r e n t values of M
s y . i n t e r v a l s <= t i b b l e (
M = 0 : N,
# q u a n t i l s o f H(N,M, n )
l b = q h y p e r ( a l p h a / 2 ,M, N=M, n ) ,
ub = q h y p e r (1 = a l p h a / 2 ,M, N=M, n )
)
# p l o t of the the i n t e r v a l s
p l o t ( x=s y . i n t e r v a l s $ M , y=s y . i n t e r v a l s $ l b , c o l =” b l u e ” ,
t y p e = ”p ” ,
x l a b = ”M” , y l a b = ” l o w e r and u p p e r bounds ” ,
main = ” s y m m e t r i c 95% i n t e r v a l s f o r X” )
p o i n t s ( x=s y . i n t e r v a l s $ M , y=s y . i n t e r v a l s $ u b , c o l =”r e d ” )

# Mention t h e l b = and ub= f u n c t i o n s a r e n o t s t r i c t l y m o n o t o n o u s l y


# i n c r e a s i n g : u s e f o r g i v e n v a l u e o f X t h e min o f t h e

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21


14
# c o r r e s p o n d i n g ub v a l u e s and t h e max o f t h e c o r r e s p o n d i n g l b
# v a l u e s o f M a s an i n v e r s e o f t h e two f u n c t i o n . These v a l u e s
# a r e t h e bounds o f t h e c o n f i d e n c e i n t e r v a l s .
ex . c o n f . i n t e r v a l l <= f u n c t i o n ( x ) {
return ( c (
s y . i n t e r v a l s %>%
f i l t e r ( ub == x ) %>%
mutate ( l = min (M) ) %>%
s e l e c t ( l ) %>%
u n i q u e ( ) %>%
as . numeric ( ) ,
s y . i n t e r v a l s %>%
f i l t e r ( l b == x ) %>%
mutate ( u = max (M) ) %>%
s e l e c t ( u ) %>%
u n i q u e ( ) %>%
as . numeric ( )
))
}

# The binom . t e s t ( x , n ) f u n c t i o n r e t u r n s i n t h e v a r i a b l e
# c o n f . i n t t h e c o n f i d e n c e i n t e r v a l f o r p=M/N i f t h e y a r e X
# w h i t e b a l l s i n a s a m p l e o f n b a l l s drawn from t h e urn
# with replacement
binom . appr . c o n f . i n t e r v a l l <= f u n c t i o n ( x ) {
return (
c(
binom . t e s t ( x , n , c o n f . l e v e l = 1= a l p h a ) $ c o n f . i n t [ 1 ] * N,
binom . t e s t ( x , n , c o n f . l e v e l = 1= a l p h a ) $ c o n f . i n t [ 2 ] * N
)
)
}

# normal a p p r o x i m a t i o n o f t h e c o n f i d e n c e i n t e r v a l f o r an
# unknown p r o p o r t i o n i f x w h i t e b a l l s a r e i n a s a m p l e o f
# n b a l l s drwan w i t h r e p l a c e m e n t
normal . appr . c o n f . i n t e r v a l l <= f u n c t i o n ( x ) {
return (
c(
N * ( x /n =qnorm(1 = a l p h a / 2 ) * s q r t ( x * (1 = x /n ) / n ˆ 2 ) ) ,
N * ( x /n +qnorm(1 = a l p h a / 2 ) * s q r t ( x * (1 = x /n ) / n ˆ 2 ) )
)
)
}

# t i b b l e o f t h e bounds o f t h e c o n f i d e n c e i n t e r v a l l s for M
# for a l l possibloe values of X
t a b <= t i b b l e (
X = 0 : n ) %>%
g r o u p b y (X) %>%
mutate ( ex . l b=ex . c o n f . i n t e r v a l l (X ) [ 1 ] ,
ex . ub=ex . c o n f . i n t e r v a l l (X ) [ 2 ] ,
binom . l b=binom . appr . c o n f . i n t e r v a l l (X ) [ 1 ] ,
binom . ub=binom . appr . c o n f . i n t e r v a l l (X ) [ 2 ] ,
norm . l b=normal . appr . c o n f . i n t e r v a l l (X ) [ 1 ] ,
norm . ub=normal . appr . c o n f . i n t e r v a l l (X ) [ 2 ]
)

# p l o t o f a l l bounds
p l o t ( x=tab $ X , y=t a b $ e x . l b , c o l =”r e d ” ,
x l a b = ” x ” , y l a b = ”M” ,
main = ”95% c o n f i d e n c e i n t e r v a l l f o r M i n H(N=500 ,M, n =50)” ,
sub = ” r e d = e x a c t , b l u e = b i n o m i a l approx , b l a c k = normal approx . ” )
p o i n t s ( x=tab $ X , y=t a b $ e x . ub , c o l =”r e d ” )
p o i n t s ( x=tab $ X , y=tab $ binom . l b , c o l =” b l u e ” )
p o i n t s ( x=tab $ X , y=tab $ binom . ub , c o l =” b l u e ” )
p o i n t s ( x=tab $ X , y=tab $ norm . l b , c o l =” b l a c k ” )
p o i n t s ( x=tab $ X , y=tab $ norm . ub , c o l =” b l a c k ” )

Prof. Dr. Falkenberg, Faculty 2 Statistics WS 20/21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy